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In  this  review  we  discuss,  from  a  unified  point  of  view,  a  variety  of  Monte 
Carlo  methods  used  to  solve  eigenvalue  problems  in  statistical  mechanics  and 
quantum  mechanics.  Although  the  applications  of  these  methods  differ  widely, 
the  underlying  mathematics  is  quite  similar  in  that  they  are  stochastic  imple¬ 
mentations  of  the  power  method.  In  all  cases,  optimized  trial  states  can  be 
used  to  reduce  the  errors  of  Monte  Carlo  estimates. 
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I.  INTRODUCTION 

Many  important  problems  in  computational  physics  and  chemistry  can  be  reduced  to  the 
computation  of  dominant  eigenvalues  of  matrices  of  high  or  infinite  order.  We  shall  focus  on 
just  a  few  of  the  numerous  examples  of  such  matrices,  namely,  quantum  mechanical  Hamil¬ 
tonians,  Markov  matrices  and  transfer  matrices.  Quantum  Hamiltonians,  unlike  the  other 
two,  probably  can  do  without  introduction.  Markov  matrices  are  used  both  in  equilibrium 
and  nonequilibrium  statistical  mechanics  to  describe  dynamical  phenomena.  Transfer  ma¬ 
trices  were  introduced  by  Kramers  and  Wannier  in  1941  to  study  the  two-dimensional  Ising 
model, H  and  ever  since,  important  work  on  lattice  models  in  classical  statistical  mechanics 
has  been  done  with  transfer  matrices,  producing  both  exact  and  numerical  results.! 

The  basic  Monte  Carlo  methods  reviewed  in  this  chapter  have  been  used  in  many  different 
contexts  and  under  many  different  names  for  many  decades,  but  we  emphasize  the  solution 
of  eigenvalue  problems  by  means  of  Monte  Carlo  methods  and  present  the  methods  from 
a  unified  point  of  view.  A  vital  ingredient  in  the  methods  discussed  here  is  the  use  of 
optimized  trial  functions.  Section  |1A]  deals  with  this  topic  briefly,  but  in  general  we  suppose 
that  optimized  trial  functions  are  given.  We  refer  the  reader  to  Ref.  for  more  details  on 
their  construction. 

The  analogy  of  the  time-evolution  operator  in  quantum  mechanics  on  the  one  hand,  and 
the  transfer  matrix  and  the  Markov  matrix  in  statistical  mechanics  on  the  other,  allows  the 
two  fields  to  share  numerous  techniques.  Specifically,  a  transfer  matrix  G  of  a  statistical 
mechanical  lattice  system  in  d  dimensions  often  can  be  interpreted  as  the  evolution  operator 
in  discrete,  imaginary  time  t  of  a  quantum  mechanical  analog  in  d  —  1  dimensions.  That 
is,  G  ~  exp(— tTi),  where  7i  is  the  Hamiltonian  of  a  system  in  d  —  1  dimensions,  the 
quantum  mechanical  analog  of  the  statistical  mechanical  system.  From  this  point  of  view, 
the  computation  of  the  partition  function  and  of  the  ground-state  energy  are  essentially  the 
same  problems:  finding  the  largest  eigenvalue  of  G  and  of  exp(— tH),  respectively.  As  far 
as  the  Markov  matrix  is  concerned,  this  simply  is  the  time-evolution  operator  of  a  system 
evolving  according  to  stochastic  dynamics.  The  largest  eigenvalue  of  such  matrices  equals 
unity,  as  follows  from  conservation  of  probability,  and  for  systems  in  thermal  equilibrium, 
the  corresponding  eigenstate  is  also  known,  namely  the  Boltzmann  distribution.  Clearly, 
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the  dominant  eigenstate  in  this  case  poses  no  problem.  For  nonequilibrium  systems,  the 
stationary  state  is  unknown  and  one  might  use  the  methods  described  in  this  chapter  in 
dealing  with  them.  Another  problem  is  the  computation  of  the  relaxation  time  of  a  system 
with  stochastic  dynamics.  This  problem  is  equivalent  to  the  computation  of  the  second 
largest  eigenvalue  of  the  Markov  matrix,  and  again  the  current  methods  apply. 

The  emphasis  of  this  chapter  is  on  methods  rather  than  applications,  but  the  reader 
should  have  a  general  idea  of  the  kind  of  problems  for  which  these  methods  can  be  employed. 
Therefore,  we  start  off  by  giving  some  specific  examples  of  the  physical  systems  one  can  deal 
with. 


A.  Quantum  Systems 


In  the  case  of  a  quantum  mechanical  system,  the  problem  in  general  is  to  compute  expec¬ 
tation  values,  in  particular  the  energy,  of  bosonic  or  fermionic  ground  or  excited  eigenstates. 
For  systems  with  n  electrons,  the  spatial  coordinates  are  denoted  by  a  3ri-dimensional  vec¬ 
tor  R.  In  terms  of  the  vectors  r,  specifying  the  coordinates  of  electron  number  i  this  reads 
R=(n,...,rn).  The  dimensionless  Hamiltonian  is  of  the  form 

(R|M|R'}  =  [-3-V2  +  V(R)]i(R  -  R').  (1) 

For  atoms  or  molecules  atomic  units  are  used  //  =  1.  and  V  is  the  usual  Coulomb  potential 
acting  between  the  electrons  and  between  the  electrons  and  nuclei  i.e., 


v(R)  =  E 

i<j 


(2) 


where  for  arbitrary  subscripts  a  and  b  we  define  rab  =  [rn  —  r&|  ;  indices  i  and  j  label  the 
electrons,  and  we  assume  that  the  nuclei  are  of  infinite  mass  and  that  nucleus  a  has  charge 
ZQ  and  is  located  at  position  rQ. 

In  the  case  of  quantum  mechanical  van  der  Waals  clusters, @11  //  is  the  reduced  mass  - 
/i  =  23‘mecr/h2  in  terms  of  the  mass  m,  Planck’s  constant  h  and  the  conventional  Lennard- 
Jones  parameters  e  and  a  —  and  the  potential  is  given  by 

v(r)  =  e4-J-  (3) 

i<j  ij  ij 

The  quantum  nature  of  the  system  increases  with  1//U2,  which  is  proportional  to  the  con¬ 
ventional  de  Boer  parameter. 

The  ground-state  wavefunction  of  a  bosonic  system  is  positive  everywhere,  which  is  very 
convenient  in  a  Monte  Carlo  context  and  allows  one  to  obtain  results  with  an  accuracy  that 
is  limited  only  by  practical  considerations.  For  fermionic  systems,  the  ground-state  wave 
function  has  nodes,  and  this  places  more  fundamental  limits  on  the  accuracy  one  can  obtain 
with  reasonable  effort.  In  the  methods  discussed  in  this  chapter,  this  bound  on  the  accuracy 
takes  the  form  of  the  so-called  fixed-node  approximation.  Here  one  assumes  that  the  nodal 
surface  is  given,  and  computes  the  ground-state  wave  function  subject  to  this  constraint. 
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The  time-evolution  operator  exp(— tTL)  in  the  position  representation  is  the  Green  func¬ 
tion 


G(R',  R,  r)  =  (R'|e_TW|R).  (4) 

For  both  bosonic  systems  and  fermionic  systems  in  the  fixed-node  approximation,  G  has 
only  nonnegative  elements.  This  is  essential  for  the  Monte  Carlo  methods  discussed  here.  A 
problem  specific  to  quantum  mechanical  systems  is  that  G  is  only  known  asymptotically  for 
short  times,  so  that  the  finite-time  Green  function  has  to  be  constructed  by  the  application 
of  the  generalized  Trotter  formulaifl,  i.e.,  G(r)  =  lim^^oo  G(r/m)m,  where  the  position 
variables  of  G  have  been  suppressed. 


B.  Transfer  Matrices 

Our  next  example  is  the  transfer  matrix  of  statistical  mechanics.  The  largest  eigenvalue 
yields  the  free  energy,  from  which  all  thermodynamic  properties  follow.  As  a  typical  transfer 
matrix,  one  can  think  of  the  one-site,  Kramers- Wannier  transfer  matrix  for  a  two-dimensional 
model  of  Ising  spins,  s*  =  ±1.  Such  a  matrix  takes  a  particularly  simple  form  for  a  square 
lattice  wrapped  on  a  cylinder  with  helical  boundary  conditions  with  pitch  one.  This  produces 
a  mismatch  of  one  lattice  site  for  a  path  on  the  lattice  around  the  cylinder.  This  geometry 
has  the  advantage  that  a  two-dimensional  lattice  can  be  built  one  site  at  a  time  and  that 
the  process  of  adding  each  single  site  is  identical  each  time.  Suppose  we  choose  a  lattice  of 
M  sites,  wrapped  on  a  cylinder  with  a  circumference  of  L  lattice  spaces.  Imagine  that  we 
are  adding  sites  so  that  the  lattice  grows  toward  the  left.  We  can  then  define  a  conditional 
partition  function  Zm (S),  which  is  a  sum  over  those  states  (also  referred  to  as  configurations) 
for  which  the  left-most  edge  of  the  lattice  is  in  a  given  state  S.  The  physical  interpretation 
of  Zm( S)  is  the  relative  probability  of  finding  the  left-most  edge  in  a  state  S  with  which  the 
rest  of  the  lattice  to  its  right  is  in  thermal  equilibrium. 

If  one  has  helical  boundary  conditions  and  spins  that  interact  only  with  their  nearest 
neighbors,  one  can  repeatedly  add  just  a  single  site  and  the  bonds  connecting  it  to  its 
neighbors  above  and  to  the  right.  Analogously,  the  transfer  matrix  G  can  be  used  to  compute 
recursively  the  conditional  partition  function  of  a  lattice  with  one  additional  site 

Zm+1(S')=£G(S'|S)Zm(S),  (5) 

s 


with 


G(S'|S)  =  exp  [A' (si  si  +  sisL)]  (6) 

i= 2 

with  S  =  (si,  s2,  •  •  • ,  sl)  and  S'  =  (si,  s'2, . . . ,  s'L),  and  the  S;,  s'  =  ±1  are  Ising  spins.  With 
this  definition  of  the  transfer  matrix,  the  matrix  multiplication  in  Eq.  ([5|)  accomplishes  the 
following:  (1)  a  new  site,  labeled  1,  is  appended  to  the  lattice  at  the  left  edge;  (2)  the 
Boltzmann  weight  is  updated  to  that  of  the  lattice  with  increased  size;  (3)  the  old  site  L 
is  thermalized;  and  finally  (4)  old  sites  1, . . . ,  L  —  1  are  pushed  down  on  the  stack  and  are 
renamed  to  2, . . . ,  L.  Sites  have  to  remain  in  the  stack  until  all  interacting  sites  have  been 
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FIG.  1.  Graphical  representation  of  the  transfer  matrix.  The  primed  variables  are  associated  with  the 
circles  and  combine  into  the  left  index  of  the  matrix;  the  dots  go  with  the  right  index  and  the  unprimed 
variables.  Coincidence  of  a  circle  and  a  dot  produces  a  5-function.  An  edge  indicates  a  contribution  to  the 
Boltzmann  weight.  Repeated  application  of  this  matrix  constructs  a  lattice  with  nearest  neighbor  bonds 
and  helical  boundary  conditions. 

added,  which  determines  the  minimal  depth  of  the  stack.  It  is  clear  from  Figure  [I]  that 
the  transfer  matrix  is  nonsymmetric  and  indeed  symmetry  is  not  required  for  the  methods 
discussed  in  this  chapter.  It  is  of  some  interest  that  transfer  matrices  usually  have  the 
property  that  a  right  eigenvector  can  be  transformed  into  a  left  eigenvector  by  a  simple 
permutation  and  reweighting  transformation.  The  details  are  not  important  here  and  let 
it  suffice  to  mention  that  this  follows  from  an  obvious  symmetry  of  the  diagram  shown 
in  Figure  |l|:  (1)  rotate  over  7 r  about  an  axis  perpendicular  to  the  paper,  which  permutes 
the  states;  and  (2)  move  the  vertical  bond  back  to  its  original  position,  which  amounts  to 
reweighting  by  the  Boltzmann  weight  of  a  single  bond. 

Equation  (|5j)  implies  that  for  large  M  and  generic  boundary  conditions  at  the  right-hand 
edge  of  the  lattice,  the  partition  function  approaches  a  dominant  right  eigenvector  -0o  of  the 
transfer  matrix  G 


ZM( S)  cx  A^o(S),  (7) 

where  Ao  is  the  dominant  eigenvalue.  Consequently,  for  M  — >  oo  the  free  energy  per  site  is 
given  by 


/  =  — AfTlnAo.  (8) 

The  problem  relevant  to  this  chapter  is  the  computation  of  the  eigenvalue  Ao  by  Monte 
Carlo.! 


C.  Markov  Matrices 

Discrete-time  Markov  processes  are  a  third  type  of  problem  we  shall  discuss.  One  of  the 
challenges  in  this  case  is  to  compute  the  correlation  time  of  such  a  process  in  the  vicinity 
of  a  critical  point,  where  the  correlation  time  goes  to  infinity,  a  phenomenon  called  “critical 
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slowing  down”.  Computationally,  the  problem  amounts  to  the  evaluation  of  the  second 
largest  eigenvalue  of  the  Markov  matrix,  or  more  precisely  its  difference  from  unity.  The 
latter  goes  to  zero  as  the  correlation  time  approaches  infinity. 

The  Markov  matrix  defines  the  stochastic  evolution  of  the  system  in  discrete  time.  That 
is,  suppose  that  at  time  t  the  probability  of  finding  the  system  in  state  S  is  given  by  pt( S). 
If  the  probability  of  making  a  transition  from  state  S  to  state  S'  is  P(S'|S)  (sorry  about  the 
hat,  we  shall  take  it  off  soon!),  then 

p1+1(s')  =  yp(s'|s)ft(s)  (9) 

s 

In  the  case  of  interest  here,  the  Markov  matrix  P  is  constructed  so  that  its  stationary  state 
is  the  Boltzmann  distribution  =  exp(— (37i).  Sufficient  conditions  are  that  (a)  each  state 
can  be  reached  from  every  state  in  a  finite  number  of  transitions  and  that  (b)  P  satisfies 
detailed  balance 

P(S'|S)^B(S)2  =  P(S|S>b(S')2.  (10) 

It  immediately  follows  from  detailed  balance  that  the  matrix  P  defined  by 

P(S'|S)  =  -i— P(  S'|SW>B(S).  (11) 

Pb(o') 

is  symmetric  and  equivalent  by  similarity  transformation  to  the  original  Markov  matrix. 
Because  of  this  symmetry,  expressions  tend  to  take  a  simpler  form  when  P  is  used,  as  we 
shall  do,  but  one  should  keep  in  mind  that  P  itself  is  not  a  Markov  matrix,  since  the  sum 
on  its  left  index  does  not  yield  unity  identically. 

Again,  to  provide  a  specific  example,  we  mention  that  the  methods  discussed  below  have 
been  applied!®  to  an  Ising  model  on  a  square  lattice  with  the  heat  bath  or  Yang11  transition 
probabilities  and  random  site  selection.  In  that  case,  time  evolution  takes  place  by  single 
spin-flip  transitions  which  occur  between  a  given  state  S  to  one  of  the  states  S'  that  differ 
only  at  one  randomly  selected  site.  For  any  such  pair  of  states,  the  transition  probability  is 
given  by 


1  e-?lian  f  O/  C 

P(S'|S)  =  <  27Vcoshi/3A H  °  ^  °  (12) 

l  l-Es^s^(S"|S)  for  S'  =  S, 

for  a  system  of  N  sites  with  Hamiltonian 

-/3H  =  KYj  +  K'  J2  sisi  ,  (13) 

(*J)  (MV 

where  (i,j)  denotes  nearest-neighbor  pairs,  and  ( i,j)'  denotes  next-nearest- neighbor  pairs 
and  AH  =  H(S')-H(S). 

We  note  that  whereas  the  transfer  matrix  is  used  to  deal  with  systems  that  are  infi¬ 
nite  in  one  direction,  the  systems  used  in  the  dynamical  computations  are  of  finite  spatial 
dimensions  only. 
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II.  THE  POWER  METHOD 


Before  discussing  technical  details  of  Monte  Carlo  methods  to  compute  eigenvalues  and 
expectation  values,  we  introduce  the  mathematical  ideas  and  the  types  of  expressions  for 
which  statistical  estimates  are  sought.  We  formulate  the  problem  in  terms  of  an  operator 
G  of  which  one  wants  to  compute  the  dominant  eigenstate  and  eigenvalue,  |^0)  and  A0. 
Mathematically,  but  not  necessarily  in  a  Monte  Carlo  setting,  dominant  may  mean  dominant 
relative  to  eigenstates  of  a  given  symmetry  only. 

The  methods  to  be  discussed  are  variations  of  the  power  method,  which  relies  on  the  fact 
that  for  a  generic  initial  state  | w®)  of  the  appropriate  symmetry,  the  states  | u®)  defined  by 

|  «(m))  =  —G\u{t))  (14) 

ct+i 

converge  to  the  dominant  eigenstate  |^o)  of  G,  if  the  constants  ct  are  chosen  so  that  | u^) 
assumes  a  standard  form,  in  which  case  the  constants  ct  converge  to  the  dominant  eigenvalue. 
This  follows  immediately  by  expanding  the  initial  state  | u^)  in  eigenstates  of  G.  One 
possible  standard  form  is  that,  in  some  convenient  representation,  the  component  of  | m^) 
largest  in  magnitude  equals  unity. 

For  quantum  mechanical  systems,  G  usually  is  the  imaginary-time  evolution  operator, 
exp(— tTL).  As  mentioned  above,  a  technical  problem  in  that  case  is  that  an  explicit  expres¬ 
sion  is  known  only  asymptotically  for  short  times  r.  In  practice,  this  asymptotic  expression 
is  used  for  a  small  but  finite  r  and  this  leads  to  systematic,  time-step  errors.  We  shall  deal 
with  this  problem  below  at  length,  but  ignore  it  for  the  time  being. 

The  exponential  operator  exp(— tH)  is  one  of  various  alternatives  that  can  be  employed 
to  compute  the  ground-state  properties  of  the  Hamiltonian.  If  the  latter  is  bounded  from 
above,  one  may  be  able  to  use  1  —  t'H,  where  r  should  be  small  enough  that  A0  =  1  —  tE0 
is  the  dominant  eigenvalue  of  II  —  t'H.  In  this  case,  there  is  no  time-step  error  and  the  same 
holds  for  yet  another  method  of  inverting  the  spectrum  of  the  Hamiltonian,  viz.  the  Green 
function  Monte  Carlo  method.  There  one  uses  (H  —  E)~l,  where  E  is  a  constant  chosen 
so  that  the  ground  state  becomes  the  dominant  eigenstate  of  this  operator.  In  a  Monte 
Carlo  context,  matrix  elements  of  the  respective  operators  are  proportional  to  transition 
probabilities  and  therefore  have  to  be  non-negative,  which,  if  one  uses  either  of  the  last  two 
methods,  may  impose  further  restrictions  on  the  values  of  r  and  E. 

For  the  statistical  mechanical  applications,  the  operators  G  are  indeed  evolution  oper¬ 
ators  by  construction.  The  transfer  matrix  evolves  the  physical  system  in  a  spatial  rather 
than  time  direction,  but  this  spatial  direction  corresponds  to  time  from  the  point  of  view 
of  a  Monte  Carlo  time  series.  With  this  in  mind,  we  shall  refer  to  the  operator  G  as  the 
evolution  operator,  or  the  Monte  Carlo  evolution  operator,  if  it  is  necessary  to  distinguish 
it  from  the  usual  time-evolution  operator  exp  {—t'H). 

Suppose  that  X  is  an  operator  of  which  one  wants  to  compute  an  expectation  value. 
Particularly  simple  to  deal  with  are  the  cases  in  which  the  operators  X  and  G  are  the  same 
or  commute.  We  introduce  the  following  notation.  Suppose  that  | ua)  and  \up)  are  two 
states,  then  X^p'1  denotes  the  matrix  element 

y(p’,p)  _  Mg XGE\ up) 
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This  definition  is  chosen  to  simplify  the  discussion,  and  generalization  to  physically  relevant 
expressions,  such  as  Eq.  |46],  is  straightforward. 

Various  Monte  Carlo  methods  are  designed  to  estimate  particular  instances  of  X^’p\ 
and  often  the  ultimate  goal  is  to  compute  the  expectation  value  in  the  dominant  eigenstate 


Xn  = 


famo) 

{i'oli'o) 


(16) 


which  reduces  to  an  expression  for  the  dominant  eigenvalue  of  interest  if  one  chooses  for 
A",  in  the  applications  discussed  in  the  introduction,  the  Hamiltonian,  transfer  or  Markov 
matrix. 

The  simplest  method  is  the  variational  Monte  Carlo  method,  discussed  in  the  next  sec¬ 
tion.  Here  an  approximate  expectation  value  is  computed  by  employing  an  approximate 
eigenvector  of  G.  Typically,  this  is  an  optimized  trial  state,  say  |wt),  in  which  case  vari¬ 
ational  Monte  Carlo  yields  \  which  is  simply  the  expectation  value  of  A"  in  the  trial 
state.  Clearly,  variational  Monte  Carlo  estimates  of  Xo  have  both  systematic  and  statistical 
errors. 

The  variational  error  can  be  removed  asymptotically  by  projecting  out  the  dominant 
eigenstate,  i.e.,  by  reducing  the  spectral  weight  of  sub-dominant  eigenstates  by  means  of 
the  power  method.  The  simplest  case  is  obtained  if  one  applies  the  power  method  only  to 
the  right  on  the  state  | up)  but  not  to  the  left  on  (ua\  in  Eq.  (15).  Mathematically,  this 
is  the  essence  of  diffusion  and  transfer  matrix  Monte  Carlo,  and  in  this  way  one  obtains 
the  desired  result  Xo  if  the  operator  X  commutes  with  the  Monte  Carlo  evolution  operator 
G.  In  our  notation,  this  means  that  X0  is  given  by  the  statistical  estimate  of  X^0f°°\  In 
principle,  this  yields  an  unbiased  estimate  of  X0,  but  in  practice  one  has  to  choose  p  finite 
but  large  enough  that  the  estimated  systematic  error  is  less  than  the  statistical  error.  In 
some  practical  situations  it  can  in  fact  be  difficult  to  ascertain  that  this  indeed  is  the  case. 
If  one  is  interested  in  the  limiting  behavior  for  infinite  p  or  p1,  the  state  \ua)  or  \up)  need  not 
be  available  in  closed  form.  This  freedom  translates  into  the  flexibility  in  algorithms  design 
exploited  in  diffusion  and  transfer  matrix  Monte  Carlo. 

If  G  and  X  do  not  commute,  the  mixed  estimator  Xj?f°°')  is  not  the  desired  result,  but  the 
residual  systematic  error  can  be  reduced  by  combining  the  variational  and  mixed  estimates 
by  means  of  the  expression 


Aq  -  2Arprp 


(0,oo) 


A'ft01  +  0[( !</>„)  -  |«.T)): 


(17) 


To  remove  the  variational  bias  systematically,  if  G  and  X  do  not  commute,  the  power 
method  must  be  used  to  both  the  left  and  the  right  in  Eq.  (|I5|).  Thus  one  obtains  from 
Xj^,oc>  an  exact  estimate  of  X0  subject  only  to  statistical  errors.  Of  course,  one  has  to  pay 
the  price  of  the  implied  double  limit  in  terms  of  loss  of  statistical  accuracy.  In  the  context 
of  the  Monte  Carlo  algorithms  discussed  below,  such  as  diffusion  and  transfer  matrix  Monte 
Carlo,  this  double  projection  technique  to  estimate  X^'oc'1  is  called  forward  walking  or  future 
walking. 

We  end  this  section  on  the  power  method  with  a  brief  discussion  of  the  computational 
complexity  of  using  the  Monte  Carlo  method  for  eigenvalue  problems.  In  Monte  Carlo 
computations  one  can  distinguish  operations  of  three  levels  of  computational  complexity, 
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depending  on  whether  the  operations  have  to  do  with  single  particles  or  lattice  sites,  the 
whole  system,  or  state  space  summation  or  integration.  The  first  typically  involves  a  fixed 
number  of  elementary  arithmetic  operations,  whereas  this  number  clearly  is  at  least  pro¬ 
portional  to  the  system  size  in  the  second  case.  Exhaustive  state-space  summation  grows 
exponentially  in  the  total  system  size,  and  for  these  problems  Monte  Carlo  is  often  the  only 
viable  option. 

Next,  the  convergence  of  the  power  method  itself  comes  into  play.  The  number  of  iter¬ 
ations  required  to  reach  a  certain  given  accuracy  is  proportional  to  log  |  Ao / Ai  | ,  where  the 
Ao  and  Ai  are  the  eigenvalues  of  largest  and  second  largest  magnitude.  If  one  is  dealing 
with  a  single-site  transfer  matrix  of  a  critical  system,  that  spectral  gap  is  proportional  to 
L~d  for  a  system  in  d  dimensions  with  a  cross  section  of  linear  dimension  L.  In  this  case,  a 
single  matrix  multiplication  is  of  the  complexity  of  a  one-particle  problem.  In  contrast,  both 
for  the  Markov  matrix  defined  above,  and  the  quantum  mechanical  evolution  operator,  the 
matrix  multiplication  itself  is  of  system-size  complexity.  Moreover,  both  of  these  operators 
have  their  own  specific  problems.  The  quantum  evolution  operator  of  G(r)  has  a  gap  on 
the  order  of  r,  which  means  that  r  should  be  chosen  large  for  rapid  convergence,  but  one 
does  not  obtain  the  correct  results,  because  of  the  time-step  error,  unless  r  is  small.  Finally, 
the  spectrum  of  the  Markov  matrix  displays  critical  slowing  down.  This  means  that  the  gap 
of  the  single  spin- flip  matrix  is  on  the  order  of  L~d~z,  where  z  is  typically  a  little  bigger 
than  twoB  These  convergence  properties  are  well  understood  in  terms  of  the  mathematics 
of  the  power  method.  Not  well  understood,  however,  are  problems  that  are  specific  to  the 
Monte  Carlo  implementation  of  this  method,  which  in  some  form  or  another  introduces 
multiplicative  fluctuating  weights  that  are  correlated  with  the  quantities  of  interest ,113’Ej 


III.  SINGLE-THREAD  MONTE  CARLO 


In  the  previous  section  we  have  presented  the  mathematical  expressions  that  can  be 
evaluated  with  the  Monte  Carlo  algorithms  to  be  discussed  next.  The  Erst  algorithm  is 
designed  to  compute  an  approximate  statistical  estimate  of  the  matrix  element  Ao  by  means 
of  the  variational  estimate  We  writcQ  (S|«t)  =  (wt|S)  =  Wt(S)  and  for  non¬ 

vanishing  wT(S)  define  the  configurational  eigenvalue  XT(S)  by 

XT(S)«T(S)  =  <S|A>t)  =  £<S|A|S'XS>t>  (18) 

S' 


This  yields 

^(o,o)  _  EsMS)2Xt(S) 

£s«t(S)2 


(19) 


*We  assume  throughout  that  the  states  we  are  dealing  with  are  represented  by  real  numbers 
and  that  X  is  represented  by  a  real,  symmetric  matrix.  In  many  cases,  generalization  to  complex 
numbers  is  trivial,  but  for  some  physical  problems,  while  formal  generalization  may  still  be  possible, 
the  resulting  Monte  Carlo  algorithms  may  be  too  noisy  to  be  practical. 
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which  shows  that  XTT  can  be  evaluated  as  a  time  average  over  a  Monte  Carlo  time  series  of 
states  Si,  S2, . . .  sampled  from  the  probability  distribution  wT( S)2,  he.,  a  process  in  which 
Prob(S),  the  probability  of  finding  a  state  S  at  any  time  is  given  by 

Prob(S)  oc  «t(S)2.  (20) 

For  such  a  process,  the  ensemble  average  in  Eq.  ©  can  be  written  in  the  form  of  a  time 
average 


4T  =  ,lim  yEXrtS,).  (21) 

L^°o  L  t=1 

For  this  to  be  of  practical  use,  it  has  to  be  assumed  that  the  configurational  eigenvalue 
Xt(S)  can  be  computed  efficiently,  which  is  the  case  if  the  sum  over  states  S'  in  (S|X|ut)  = 
X)s,(S|-X’|S/)(S/|wt)  can  be  performed  explicitly.  For  discrete  states  this  means  that  X 
should  be  represented  by  a  sparse  matrix;  if  the  states  S  form  a  continuum,  XT(S)  can 
be  computed  directly  if  A"  is  diagonal  or  near- diagonal,  he.,  involves  no  or  only  low-order 
derivatives  in  the  representation  used.  The  more  complicated  case  of  an  operator  X  with 
arbitrarily  nonvanishing  off-diagonal  elements  will  be  discussed  at  the  end  of  this  section. 

An  important  special  case,  relevant  for  example  to  electronic  structure  calculations,  is 
to  choose  for  the  operator  X  the  Hamiltonian  7i  and  for  S  the  3N- dimensional  real-space 
configuration  of  the  system.  Then,  the  quantity  XT  is  called  the  local  energy,  denoted  by  EL. 
Clearly,  in  the  ideal  case  that  Ut)  is  an  exact  eigenvector  of  the  evolution  operator  G,  and 
if  X  commutes  with  G  then  the  configurational  eigenvalue  AA(S)  is  a  constant  independent 
of  S  and  equals  the  true  eigenvalue  of  X.  In  this  case  the  variance  of  the  Monte  Carlo 
estimator  in  Eq.  (|2T|)  goes  to  zero,  which  is  an  important  zero-variance  principle  satisfied  by 
variational  Monte  Carlo.  The  practical  implication  is  that  the  efficiency  of  the  Monte  Carlo 
computation  of  the  energy  can  be  improved  arbitrarily  by  improving  the  quality  of  the  trial 
function.  Of  course,  usually  the  time  required  for  the  computation  of  Wt(S)  increases  as  the 
approximation  becomes  more  sophisticated.  For  the  energy  the  optimal  choice  minimizes 
the  product  of  variance  and  time;  no  such  optimum  exists  for  an  operator  that  does  not 
commute  with  G  or  if  one  makes  the  fixed  node  approximation,  described  in  Section  |Vi  A  2, 
since  in  these  cases  the  results  have  a  systematic  error  that  depends  on  the  quality  of  the 
trial  wavefunction. 


A.  Metropolis  Method 

A  Monte  Carlo  process  sampling  the  probability  distribution  wT(S)2  is  usually  generated 
by  means  of  the  generalized  Metropolis  algorithm,  as  follows.  Suppose  a  configuration  S  is 
given  at  time  t  of  the  Monte  Carlo  process.  A  new  configuration  S'  at  time  t  + 1  is  generated 
by  means  of  a  stochastic  process  that  consists  of  two  steps:  (1)  an  intermediate  configuration 
S"  is  proposed  with  probability  n(S"|S);  (2. a)  S'  =  S"  with  probability  p  =  A(S"|S),  i.e., 
the  proposed  configuration  is  accepted;  (2.b)  S'  =  S  with  probability  q  =  1  —  A(S"|S),  i.e., 
the  proposed  configuration  is  rejected  and  the  old  configuration  S  is  promoted  to  time  t  +  1. 
More  explicitly,  the  Monte  Carlo  sample  is  generated  by  means  of  a  Markov  matrix  P  with 
elements  P(S'|S)  of  the  form 
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P(S'|S) 


A(S'|S)  n(S/|S)  for  S'  ^  S 

1  -  £s,vs  A(S"|S)  n(S"|S)  for  S'  =  S, 


(22) 


if  the  states  are  discrete  and 


p(s'|s)  =  A{  s'|s)  n(s'|s)  + 


1 


dSM(S"|S)  n(S"|S) 


5(S'  -  S) 


(23) 


if  the  states  are  continuous.  Correspondingly,  in  Eq.  (|22|)  fl  and  P  are  probabilities  while 
in  Eq.  (|23|)  they  are  probability  densities. 

The  Markov  matrix  P  is  designed  to  satisfy  detailed  balance 

P(S'|S)MS)2  =  P(S|S')«t(S')2,  (24) 


so  that,  if  the  process  has  a  unique  stationary  distribution,  this  will  be  mt(S)2,  as  desired. 
In  principle,  one  has  great  freedom  in  the  choice  of  the  proposal  matrix  fl,  but  it  is  necessary 
to  satisfy  the  requirement  that  transitions  can  be  made  between  (almost)  any  pair  of  states 
with  nonvanishing  probability  (density)  in  a  finite  number  of  steps. 

Once  a  proposal  matrix  fl  is  selected,  an  acceptance  matrix  is  defined  so  that  detailed 
balance,  Eq.  (|24]),  is  satisfied 


A{  S'|S) 


mm 


n(s|s')  nT(s')2 
n(S'|S)  uT(S)2 


(25) 


For  a  given  choice  of  fl,  infinitely  many  choices  can  be  made  for  A  that  satisfy  detailed 
balance  but  the  preceding  choice  is  the  one  with  the  largest  acceptance.  We  note  that  if  the 
preceding  algorithm  is  used,  then  A"t(S4)  in  the  sum  in  Eq.  (|2Tp ,  can  be  replaced  by  the 
expectation  value  conditional  on  S"  having  been  proposed: 


v(°>°) 

yV  rprp 


iim  -i>xT(s;')  +  ?,w(s,)], 


t= l 


(26) 


where  rpt  —  l  —  qf  is  the  probability  of  accepting  S".  This  has  the  advantage  of  reducing  the 
statistical  error  somewhat,  since  now  A"t(S(')  contributes  to  the  average  even  for  rejected 
moves,  and  will  increase  efficiency  if  A"t(S")  is  readily  available. 

If  the  proposal  matrix  fl  is  symmetric,  as  is  the  case  if  one  samples  from  a  distribution 
uniform  over  a  cube  centered  at  S,  such  as  in  the  original  Metropolis  method0,  the  factors 
of  fl  in  the  numerator  and  denominator  of  Eq.  (25)  cancel. 

Finally  we  note  that  it  is  not  necessary  to  sample  the  distribution  u\  to  compute  Xtt: 
any  distribution  that  has  sufficient  overlap  with  u\  will  do.  To  make  this  point  more 
explicitly,  let  us  introduce  the  average  of  some  stochastic  variable  X  with  respect  to  an 
arbitrary  distribution  p : 


_Ssjr(S)p(S) 

'  Esp(S) 


(27) 


The  following  relation  shows  that  the  desired  results  can  be  obtained  by  reweighting,  he., 
any  distribution  p  will  suffice  as  long  as  the  ratio  Wt(S)2/p(S)  does  not  fluctuate  too  wildly: 
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v(0,0) 

y\ 


(A'm \!p)p 

(utI  P)p 


(28) 


=  (A')„=  = 


This  is  particularly  useful  for  calculation  of  the  difference  of  expectation  values  with  respect 
to  two  closely  related  distributions.  An  example  of  tlnffi  is  the  calculation  of  the  energy 
of  a  molecule  as  a  function  of  the  inter-nuclear  distance. 


B.  Projector  Monte  Carlo  and  Importance  Sampling 


The  generalized  Metropolis  method  is  a  very  powerful  way  to  sample  an  arbitrary  given 
distribution,  and  it  allows  one  to  construct  infinitely  many  Markov  processes  with  the  desired 
distribution  as  the  stationary  state.  None  of  these,  however,  may  be  appropriate  to  design 
a  Monte  Carlo  version  of  the  power  method  to  solve  eigenvalue  problems.  In  this  case,  the 
evolution  operator  G  is  given,  possibly  in  approximate  form,  and  its  dominant  eigenstate 
may  not  be  known.  To  construct  an  appropriate  Monte  Carlo  process,  the  first  problem 
is  that  G  itself  is  not  a  Markov  matrix,  i.e.,  it  may  violate  one  or  both  of  the  properties 
G(S'|S)  >  0  and  J2s'  G(S'|S)  =  1.  This  problem  can  be  solved  if  we  can  find  a  factorization 
of  the  evolution  matrix  G  into  a  Markov  matrix  P  and  a  weight  matrix  g  with  non-negative 
elements  such  that 


G(S'|S)=S(S'|S)P(S'|S). 


(29) 


The  weights  g  must  be  finite,  and  this  almost  always  precludes  use  of  the  Metropolis  method 
for  continuous  systems,  as  can  be  understood  as  follows.  Since  there  is  a  finite  probabil¬ 
ity  that  a  proposed  state  will  be  rejected,  the  Markov  matrix  P(S'|S)  will  contain  terms 
involving  S( S  —  S'),  but  generically,  G  will  not  have  the  same  structure  and  will  not  allow 
the  definition  of  finite  weights  g  according  to  Eq.  (^).  However,  the  Metropolis  algorithm 
can  be  incorporated  as  a  component  of  an  algorithm  if  an  approximate  stationary  state  is 
known  and  if  further  approximations  are  made,  as  in  the  diffusion  Monte  Carlo  algorithm 
discussed  in  Section  |VIB|. 

As  a  comment  on  the  side  we  note  that  violation  of  the  condition  that  the  weight  g  be 
positive  results  in  the  notorious  sign  problem,  in  one  of  its  guises,  which  is  in  most  cases 
unavoidable  in  the  treatment  of  fermionic  or  frustrated  systems.  Many  ingenious  attempts^ 
have  been  made  to  solve  this  problem,  but  this  is  still  a  topic  of  active  research.  However, 
as  mentioned,  we  restrict  ourselves  in  this  chapter  to  the  case  of  evolution  operators  G  with 
nonnegative  matrix  elements  only. 

We  resume  our  discussion  of  the  factorization  given  in  Eq.  (|29|).  Suppose  for  the  sake  of 
argument  that  the  left  eigenstate  ipo  °f  G  is  known  and  that  its  elements  are  positive, 


^^o(S,)G(S/|S)  =  A0^o(S). 

S' 


(30) 


If  in  addition,  the  matrix  elements  of  G  are  nonnegative,  the  following  matrix  P  is  a  Markov 
matrix 


P(S'|S) 


-U/'o(S')G(S'|S) 

^0 


1 

5(s )’ 


(31) 
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Unless  one  is  dealing  with  a  Markov  matrix  from  the  outset,  the  left  eigenvector  of  G  is 
seldom  known,  but  it  is  convenient,  in  any  event,  to  perform  a  so-called  importance  sampling 
transformation  on  G.  For  this  purpose  we  introduce  a  guiding  function  ug  and  define 

G(S'|S)  =  ug(S')G(S'|S)-t-  (32) 

We  shall  return  to  the  issue  of  the  guiding  function,  but  for  the  time  being  the  reader  can 
think  of  it  either  as  an  arbitrary,  positive  function,  or  as  an  approximation  to  the  dominant 
eigenstate  of  G.  From  a  mathematical  point  of  view,  anything  that  can  be  computed  with 
the  original  Monte  Carlo  evolution  operator  G  can  also  be  computed  with  G ,  since  the  two 
represent  the  same  abstract  operator  in  a  different  basis.  The  representations  differ  only  by 
normalization  constants.  All  we  have  to  do  is  to  write  all  expressions  derived  above  in  terms 
of  this  new  basis. 

We  continue  our  discussion  in  terms  of  the  transform  G  and  replace  Eq.  (|29j)  by  the 
factorization 


G(S'|S)  =  g(S'|S)P(S'|S)  (33) 

and  we  assume  that  P  has  the  explicitly  known  distribution  u2  as  its  stationary  state.  The 
guiding  function  ug  appears  in  those  expressions,  and  it  should  be  kept  in  mind  that  they 
can  be  reformulated  by  means  of  the  reweighting  procedure  given  in  Eq.  (p8|)  to  apply  to 
processes  with  different  explicitly  known  stationary  states.  On  the  other  hand,  one  might  be 
interested  in  the  infinite  projection  limit  p  —>  oo.  In  that  case,  one  might  use  a  Monte  Carlo 
process  for  which  the  stationary  distribution  is  not  known  explicitly.  Then,  the  expressions 
below  should  be  rewritten  so  that  the  unknown  distribution  does  not  appear  in  expressions 
for  the  time  averages.  The  function  ug  will  still  appear,  but  only  as  a  transformation  known 
in  closed  form  and  no  longer  as  the  stationary  state  of  P.  Clearly,  a  process  for  which  the 
distribution  is  not  known  in  closed  form  cannot  be  used  to  compute  the  matrix  elements 
X^3  P)  for  finite  p  and  p'  and  given  states  \ua)  and  \up). 

One  possible  choice  for  P  that  avoids  the  Metropolis  method  and  produces  finite  weights 
is  the  following  generalized  heat  bath  transition  matrix 


P(S'|S) 


G(S'\S) 

Es.  G(S,|S)' 


(34) 


If  G(S'|S)  is  symmetric,  this  transition  matrix  has  a  known  stationary  distribution,  viz., 
(jg(S)Mg(S),  where  Gg(S)  =  (S|G|Mg)/Mg(S),  the  configurational  eigenvalue  of  G  in  state  S. 
P  must  be  chosen  such  that  the  corresponding  transitions  can  be  sampled  directly.  This  is 
usually  not  feasible  unless  P  is  sparse  or  near-diagonal,  or  can  be  transformed  into  a  form 
involving  non-interacting  degrees  of  freedom.  We  note  that  if  P  is  defined  by  Eq.  (|34l),  the 
weight  matrix  g  depends  only  on  S. 


C.  Matrix  Elements 


We  now  address  the  issue  of  computing  the  matrix  elements  xjfg’p\  assuming  that  the 
stationary  state  u 2  is  known  explicitly  and  that  the  weight  matrix  g  has  hnite  elements. 
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We  shall  discuss  the  following  increasingly  complex  possibilities:  (a)  [. X ,  G]  —  0  and  A"  is 
near-diagonal  in  the  S  representation;  (b)  X  is  diagonal  in  the  S  representation;  (c)  A(S|S/) 
is  truly  off-diagonal.  The  fourth  case,  viz. ,  [A",  G]  —  0  and  X  is  not  near-diagonal  is  omitted 
since  it  can  easily  be  constructed  from  the  three  cases  discussed  explicitly.  When  discussing 
case  (c)  we  shall  introduce  the  concept  of  side  walks  and  explain  how  these  can  be  used  to 
compute  matrix  elements  of  a  more  general  nature  than  discussed  up  to  that  point.  After 
deriving  the  expressions,  we  shall  discuss  the  practical  problems  they  give  rise  to,  and  ways 
to  reduce  the  variance  of  the  statistical  estimators.  Since  this  yields  expressions  in  a  more 
symmetric  form,  we  introduce  the  transform 

V(S'|S)  =  itg(S')V  (S'|S) — h-.  (35) 

Ug{D) 


1.  [A,  G]  =  0  and  X  Near- Diagonal 


In  this  case,  X^f  +,J>  =  X^3 ,p)  and  it  suffices  to  consider  the  computation  of  X^\  By 
repeated  insertion  in  Eq.  (pA|)  of  the  resolution  of  the  identity  in  the  S-basis,  one  obtains 
the  expression 

X(V)  =  Eg, . s,  MS,)  W(S,)  into1  G(S,+1|S,)]  «g(S0) 

Es, . s.«„(Sr)[n&1G(Sw|Sj)]tig(So) 

In  the  steady  state,  a  series  of  subsequent  states  St,  St+i, . . . ,  St+P  occurs  with  probability 

p- 1 

Prob(St,St+1,...,St+p)  oc  [nnSi+i+i|St+i)]  ug(St)2.  (37) 

i= 0 

To  relate  products  of  the  matrix  P  to  those  of  G ,  it  is  convenient  to  introduce  the  following 
definitions 


p- 1 


Wt(p,q)  =  l[9(St+i+1\St+i) 


i=q 


Also,  we  define 


uu{ S)  = 


uu{  S) 


Mg(S)  ’ 

where  u  can  be  any  of  a  number  of  subscripts. 


With  these  definitions,  combining  Eqs.  (|29|) ,  (|36|),  and  (|37D,  one  hnds 

v(0,p)  r  Ht=i  Ua(St+P)  Xa(St+p)  Wt(p,  0)up(St) 

^ - 


£i= i  ua(St+p)  Wt(p,0)%(St) 


(38) 


(39) 


(40) 
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2.  Diagonal  X 


The  preceding  discussion  can  be  generalized  straightforwardly  to  the  case  in  which  X 
is  diagonal  in  the  S  representation.  Again  by  repeated  insertion  of  the  resolution  of  the 
identity  in  the  S-basis  in  the  Eq.  (B  for  X^pP\  one  obtains  the  identity 


X. 


(p'.p) 

a/3 


,S0 


MSP'+P)  [niT^G^+rlSO]  X(SP\SP)  [nUGtSi+ilSi)]  ^(So) 
Esp,+p,...,so  M^+P)  [nSp_1G(si+1|so]  M So) 


(41) 


Again  by  virtue  of  Eq.  we  find 


y(p',p) 

Xa/3 


lim 

L — eoc 


^2t=i  ticti^t+p'+p)  Wt(p'  +  p,p)X(  Sf+P| 

^ t+p )  Wt(p,O)u0(St) 
Ei=  1  fia(Sf+p'+p)  Wt(j/  +p,0)fi/j(St) 


(42) 


3.  Nondiagonal  X 


If  the  matrix  elements  of  G  vanish  only  if  those  of  X  do,  the  preceding  method  can  be 
generalized  immediately  to  the  final  case  in  which  A"  is  nondiagonal.  Then,  the  analog  of 
Eq.  (|2])  is 


Af >>  =  lim 

^  L— KX 


Et=i  ua(St+p'+p)  W'tip  T  PtP  T  l)^(St+p+i|S4+p)  lEt(p,0)^(St) 
Et=l  fla(St+p'+p)  Wt(p'  +  p,0)u, j(St) 


(43) 


where  the  x  matrix  elements  are  defined  by 


X(S'|S)  X(S'|S) 

“  P(S'|S)  ““  P(S'|S)' 


(44) 


Clearly,  the  preceding  definition  of  x(S'|S)  fails  when  P(S'|S)  vanishes  but  X(S'|S)  does  not. 
If  that  can  happen,  a  more  complicated  scheme  can  be  employed  in  which  one  introduces 
side-walks.  This  is  done  by  interrupting  the  continuing  stochastic  process  at  time  t  +  p  by 
introducing  a  finite  series  of  auxiliary  states  S(+p+1, . . . ,  S(+p/+p.  The  latter  are  generated 
by  a  separate  stochastic  process  so  that  in  equilibrium,  the  sequence  of  subsequent  states 
S t,  St+i, . . . ,  St+P,  S;+p+1, . . . ,  S(+p,+p  occurs  with  probability 

Prob[(Si;  Sm, . . . ,  St+P,  S(+p+1, . . . ,  S;+p,+p)]  oc 

[nftri1  P(s[+l+1\s't+l)}  Px{ S’t+p+1\st+p)  [n?=o  H st+i+1\st+i)}Ug(sty  (45) 


where  P\  is  a  Markov  matrix  chosen  to  replace  P  in  Eq.  ([H])  so  as  to  yield  finite  weights  x. 
In  this  scheme,  one  generates  a  continuing  thread  identical  to  the  usual  Monte  Carlo  process 
in  which  each  state  S t  is  sampled  from  the  stationary  state  of  P,  at  least  if  one  ignores  the 
initial  equilibration.  Each  state  S t  of  this  backbone  forms  the  beginning  of  a  side  walk,  the 
first  step  of  which  is  sampled  from  P\,  while  P  again  generates  subsequent  ones.  Clearly, 
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with  respect  to  the  side  walk,  the  first  step  disrupts  the  stationary  state,  so  that  the  p'  states 
S(,,  which  form  the  side  walk,  do  not  sample  the  stationary  state  of  the  original  stochastic 
process  generated  by  P,  unless  Px  coincidentally  has  the  same  stationary  state  as  P. 

A  problem  with  the  matrix  elements  we  dealt  with  up  to  now  is  that  in  the  limit  p'  or 
p  — >  oo  all  of  them  reduce  to  matrix  elements  involving  the  dominant  eigenstate,  although 
symmetries  might  be  used  to  yield  other  eigenstates  besides  the  absolute  dominant  one. 
However,  if  symmetries  fail,  one  has  to  employ  the  equivalent  of  an  orthogonalization  scheme, 
such  as,  discussed  in  the  next  section,  or  one  is  forced  to  resort  to  evolution  operators  that 
contain,  in  exact  or  in  approximate  form,  the  corresponding  projections.  An  example  of  this 
are  matrix  elements  computed  in  the  context  of  the  fixed-node  approximation!!!,  discussed 
in  Section  |VI  A  %  Within  the  framework  of  this  approximation,  one  considers  quantities  of 
the  form 


-(P',P)  =  {u^GjxG^up) 

vAo  I  GY  I «»)  (“0 1  Gf '  \up) 


(46) 


where  the  Gi  are  evolution  operators  combined  with  appropriate  projectors,  which  in  the 
fixed- node  approximation  are  defined  by  the  nodes  of  the  states  ua( S)  and  ^(S).  We  shall 
describe  how  the  preceding  expression,  Eq.  (|I6|),  can  be  evaluated,  but  rather  than  writing 
out  all  the  expressions  explicitly,  we  present  just  the  essence  of  the  Monte  Carlo  method. 

To  deal  with  these  expressions,  one  generates  a  backbone  time  series  of  states  sampled 
from  any  distribution,  say,  ug( S)2,  that  has  considerable  overlap  with  the  the  states  |wQ(S)| 
and  \up(S)\.  Let  us  distinguish  those  backbone  states  by  a  superscript  0.  Consider  any  such 
state  at  some  given  time.  It  forms  the  starting  point  of  two  side  walks.  We  denote  the 
states  of  these  side  walks  by  where  i  —  1,2  identifies  the  side  walk  and  tt  labels  the 
side  steps.  The  side  walks  are  generated  from  factorizations  of  the  usual  form,  defined  in 
Eq.  (PI),  say  Gi  =  cjiPi.  A  walk 

5  =  [S<°>,  (S'11,  S'11, . . .),  (S?>,  S<2), . . .)]  (47) 

occurs  with  probability 

Prob(S)  =  ug(S(0))2  A(S(0)|Sf))...JP2(S(0)|Si2))....  (48) 


We  leave  it  to  the  reader  to  show  that  this  probability  suffices  for  the  computation  of  all 
expressions  appearing  in  numerator  and  denominator  of  Eq.  (|46|),  in  the  case  that  X  is 
diagonal,  and  to  generate  the  appropriate  generalizations  to  other  cases. 

In  the  expressions  derived  above,  the  power  method  projections  precipitate  products  of 
reweighting  factors  g,  and,  as  the  projection  times  p  and  p'  increase,  the  variance  of  the 
Monte  Carlo  estimators  grows  at  least  exponentially  in  the  square  root  of  the  projection 
time.  Clearly,  the  presence  of  the  fluctuating  weights  g  is  due  to  the  fact  that  the  evolution 
operator  G  is  not  Markovian  in  the  sense  that  it  fails  to  conserve  probability.  The  importance 
sampling  transformation  Eq.  (|32|)  was  introduced  to  mitigate  this  problem.  In  Section  E-  an 
algorithm  involving  branching  walks  will  be  introduced,  which  is  a  different  strategy  devised 
to  deal  with  this  problem.  In  diffusion  and  transfer  matrix  Monte  Carlo,  both  strategies, 
importance  sampling  and  branching,  are  usually  employed  simultaneously. 
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D.  Excited  States 


Given  a  set  of  basis  states,  excited  eigenstates  can  be  computed  variationally  by  solving 
a  linear  variational  problem  and  the  Metropolis  method  can  be  used  to  evaluate  the  required 
matrix  elements.  The  methods  involving  the  power  method,  as  described  above,  can  then 
be  used  to  remove  the  variational  bias  systematically.@0E3 

In  this  context  matrix  elements  appear  in  the  solution  of  the  following  variational  prob¬ 
lem.  As  was  mentioned  several  times  before,  the  price  paid  for  reducing  the  variational  bias 
is  increased  statistical  noise,  a  problem  which  appears  in  this  context  with  a  vengeance. 
Again,  the  way  to  keep  this  problem  under  control  is  the  use  of  optimized  trial  vectors. 

The  variational  problem  to  be  solved  is  the  following  one.  Given  n  basis  functions  | iq), 
find  the  n  x  n  matrix  of  coefficients  dP'1  such  that 


W’j  >  =  E<*“k> 


(49) 


2=1 


are  the  best  variational  approximations  for  the  n  lowest  eigenstates  | fy)  of  some  Hamiltonian 
TL.  In  this  problem  we  shall  use  the  language  of  the  quantum  mechanical  systems,  where  one 
has  to  distinguish  the  Hamiltonian  from  the  evolution  operator  exp(—  rTt).  In  the  statistical 
mechanical  applications,  one  has  only  the  equivalent  of  the  latter.  In  the  expressions  to  be 
derived  below  the  substitution  HGP  — »  Gp+1  will  produce  the  expressions  required  for  the 
statistical  mechanical  applications,  at  least  if  we  assume  that  the  nonsymmetric  matrices 
that  appear  in  that  context  have  been  synmietrized.Q 

One  seeks  a  solution  to  the  linear  variational  problem  in  Eq.  (P?p  in  the  sense  that  for 
all  i  the  Rayleigh  quotient  ('0j|7d|'0j)/('i/’i|'0i)  is  stationary  with  respect  to  variation  of  the 
coefficients  d.  The  solution  is  that  the  matrix  of  coefficients  d  has  to  satisfy  the  following 
generalized  eigenvalue  equation 


£,£aw 


to 


(50) 


2=1 


2=1 


where 


Hki  (uk\H\Ui), 


(51) 


and 


Nki  =  (uk\ui).  (52) 

Before  discussing  Monte  Carlo  issues,  we  note  a  number  of  important  properties  of  this 
scheme.  First  of  all,  the  basis  states  |tq)  in  general  are  not  orthonormal  and  this  is  reflected 
by  the  fact  that  the  matrix  elements  of  N  have  to  be  computed.  Secondly,  it  is  clear  that  any 


^This  is  not  possible  in  general  for  transfer  matrices  of  systems  with  helical  boundary  conditions, 
but  the  connection  between  left  and  right  eigenvectors  of  the  transfer  matrix  (see  Section  I  B)  can 
be  used  to  generalize  the  approach  discussed  here. 
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nonsingular  linear  combination  of  the  basis  vectors  will  produce  precisely  the  same  results, 
obtained  from  the  correspondingly  transformed  version  of  Eq.  (^).  The  final  comment  is 
that  the  variational  eigenvalues  bound  the  exact  eigenvalues,  i.e.,  Ej  >  Ei.  One  recovers 
exact  eigenvalues  Ei  and  the  corresponding  eigenstates,  if  the  | Ui)  span  the  same  space  as 
the  exact  eigenstates. 

The  required  matrix  elements  can  be  computed  using  the  variational  Monte  Carlo  method 
discussed  in  the  previous  section.  Furthermore,  the  power  method  can  be  used  to  reduce 
the  variational  bias.  Formally,  one  simply  defines  new  basis  states 

KW)  =  Gp\Ui)  (53) 


and  substitutes  these  new  basis  states  for  the  original  ones.  The  corresponding  matrices 


(54) 


and 


NS  =  w> 


Cp)  i 


Xp)\ 

Ll  / 


(55) 


can  again  be  computed  by  applying  the  methods  introduced  in  Section  |T|  for  the  computa¬ 
tion  of  general  matrix  elements  by  a  Monte  Carlo  implementation  of  the  power  method. 

As  an  explicit  example  illustrating  the  nature  of  the  Monte  Carlo  time-averages  that 
one  has  to  evaluate  in  this  approach,  we  write  down  the  expression  for  l\k  as  used  for  the 
computation  of  eigenvalues  of  the  Markov  matrix  relevant  to  the  problem  of  critical  slowing 
down: 


K 


(p) 


E 


^b(s*)  Va(s  t+py 


(56) 


where  the  S*  are  configurations  forming  a  time  series  which,  as  we  recall,  is  designed  to 
sample  the  distribution  of  a  system  in  thermodynamic  equilibrium,  i.e.,  the  Boltzmann 
distribution  kn  The  expression  given  in  Eq.  (|56P  yields  the  u / ^B-auto-correlation  function 
at  lag  p.  The  expression  for  H-J'1  is  similar,  and  represents  a  cross-correlation  function 
involving  the  configurational  eigenvalues  of  the  Markov  matrix  in  the  various  basis  states. 
Compared  to  the  expressions  derived  in  Section  [TTT|.  Eq.  ffjTil)  takes  a  particularly  simple  form 
in  which  products  of  fluctuating  weights  are  absent,  because  in  this  particular  problem  one 
is  dealing  with  a  probability  conserving  evolution  operator  from  the  outset. 

Eq.  (|5fiP  shows  why  this  method  is  sometimes  called  correlation  function  Monte  Carlo, 
but  it  also  illustrates  a  new  feature,  namely,  that  it  is  efficient  to  compute  all  required 
matrix  elements  simultaneously.  This  can  be  done  by  generating  a  Monte  Carlo  process 
with  a  known  distribution  which  has  sufficient  overlap  with  all  k(S)|  =  |(S|iq)|-  This  can 
be  arranged,  for  example,  by  sampling  a  guiding  function  ug( S)  defined  by 


Wg(S) 


n 


\ 


y  Oj!ij(s)2, 


(57) 


where  the  coefficients  a*  approximately  normalize  the  basis  states  k)>  which  may  require 
a  preliminary  Monte  Carlo  run.  See  Ceperley  and  Bernu0  for  an  alternative  choice  for  a 
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guiding  function.  In  the  computations!!]  to  obtain  the  spectrum  of  the  Markov  matrix  in 
critical  dynamics,  as  illustrated  by  Eq.  (^6]),  the  Boltzmann  distribution,  is  used  as  a  guiding 
function.  It  apparently  has  enough  overlap  with  the  decaying  modes  that  no  special  purpose 
distribution  has  to  be  generated. 


E.  How  to  Avoid  Reweighting 

Before  discussing  the  branching  algorithms  designed  to  deal  more  efficiently  with  the 
reweighting  factors  appearing  in  the  expressions  discused  above,  we  briefly  mention  an  al¬ 
ternative  that  has  surfaced  occasionally  without  being  studied  extensively,  to  our  knowledge. 
The  idea  will  be  illustrated  in  the  case  of  the  computation  of  the  matrix  element  X^\  and 
we  take  Eqn.  (|36|)  as  our  starting  point.  In  statistical  mechanical  language,  we  introduce  a 
reduced  Hamiltonian 

p- 1 

H  =  lnug(Sp)  +  ^  lnG(Si+i|Si)  +  lnug(So)  (58) 

i= 0 

and  the  corresponding  Boltzmann  distribution  exp—  H(Sp, . . .  ,S0).  One  can  now  use  the 
standard  Metropolis  algorithm  to  sample  this  distribution  for  this  system  consisting  of  p  + 1 
layers  bounded  by  the  layers  0  and  p.  For  the  evaluation  of  Eq.  (j3Bj)  by  Monte  Carlo, 
this  expression  then  straightforwardly  becomes  a  ratio  of  correlation  functions  involving 
quantities  defined  at  the  boundaries.  To  see  this,  all  one  has  to  do  is  to  divide  the  numerator 
and  denominator  of  Eq.  (|36|)  by  the  partition  function 

Z  =  J2  e"w(Sp’-’So)  (59) 

Sp,...,So 

Note  that  in  general,  boundary  terms  involving  some  appropriately  defined  ug  should  be 
introduced  to  ensure  the  non- negativity  of  the  distribution.  For  the  simultaneous  compu¬ 
tation  of  matrix  elements  for  several  values  of  the  indices  a  and  /?,  a  guiding  function  ug 
should  be  chosen  that  has  considerable  overlap  with  the  corresponding  \ua\  and  | up\. 

The  Metropolis  algorithm  can  of  course  be  used  to  sample  any  probability  distribution, 
and  the  introduction  of  the  previous  Hamiltonian  illustrates  just  one  particular  point  of 
view.  If  one  applies  the  preceding  idea  to  the  case  of  the  imaginary-time  quantum  mechanical 
evolution  operator,  one  obtains  a  modified  version  of  the  standard  path-integral  Monte  Carlo 
method,  in  which  case  the  layers  are  usually  called  time  slices.  Clearly,  this  method  has  the 
advantage  of  suppressing  the  fluctuating  weights  in  estimators.  However,  the  disadvantage 
is  that  sampling  the  full,  layered  system  yields  a  longer  correlation  time  than  sampling  the 
single-layer  distribution  u2.  This  is  a  consequence  of  the  fact  that  the  microscopic  degrees 
of  freedom  are  more  strongly  correlated  in  a  layered  system  than  in  a  single  layer.  Our 
limited  experience  suggests  that  for  small  systems  reweighting  is  more  efficient,  whereas  the 
Metropolis  approach  tends  to  become  more  efficient  as  the  system  grows  in  size.H3 


IV.  TRIAL  FUNCTION  OPTIMIZATION 

In  the  previous  section  it  was  shown  that  eigenvalue  estimates  can  be  obtained  as  the 
eigenvalues  of  the  matrix  N ^  H^.  The  variational  error  in  these  estimates  decreases 
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as  p  increases.  In  general,  these  expressions  involve  weight  products  of  increasing  length, 
and  consequently  the  errors  grow  exponentially,  but  even  in  the  simple  case  of  a  probability 
conserving  evolution  operator,  errors  grow  exponentially.  This  is  a  consequence  of  the  fact 
that  the  auto-correlation  functions  in  N&\  and  the  cross-correlation  functions  in  H^p\  in 
the  limit  p  — ►  oo  reduce  to  quantities  that  contain  an  exponentially  vanishing  amount  of 
information  about  the  subdominant  or  excited-state  eigenvalues,  since  the  spectral  weight 
of  all  but  the  dominant  eigenstate  is  reduced  to  zero  by  the  power  method. 

The  practical  implication  is  that  this  information  has  to  be  retrieved  with  sufficient 
accuracy  for  small  values  of  p,  before  the  signal  disappears  in  the  statistical  noise.  The 
projection  time  p  can  be  kept  small  by  using  optimized  basis  states  constructed  to  reduce 
the  overlap  of  the  linear  space  spanned  by  the  basis  states  | ufi  with  the  space  spanned  by 
the  eigenstates  beyond  the  first  n  of  interest.  We  shall  describe,  mostly  qualitatively,  how 
this  can  be  done  by  a  generalization  of  a  method  used  for  optimization  of  individual  basis 
stateg^ffirE^  viz.  minimization  of  variance  of  the  configurational  eigenvalue,  the  local  energy 
in  quantum  Hamiltonian  problems. 

Suppose  that  ut(S,v)  is  the  value  of  the  trial  function  wT  for  configuration  S  and  some 
choice  of  the  parameters  v  to  be  optimized.  As  in  Eq.  ([L8D,  the  configurational  eigenvalue 
A (S,v)  of  configuration  S  is  defined  by 

u'T(S,v)  =  X(S,v)ut(S,v),  (60) 

where  a  prime  is  used  to  denote,  for  arbitrary  |/),  the  components  of  G\f),  or  H\f)  as  is 
more  convenient  for  quantum  mechanical  applications.  The  optimal  values  of  the  variational 
parameters  are  obtained  by  minimization  of  the  variance  of  A(S,  u),  estimated  as  an  average 
over  a  small  Monte  Carlo  sample.  In  the  ideal  case,  i.e.,  if  an  exact  eigenstate  can  be 
reproduced  by  some  choice  of  the  parameters  of  ut,  the  minimum  of  the  variance  yields  the 
exact  eigenstate  not  only  if  it  were  to  be  computed  exactly,  but  even  if  it  is  approximated 
by  summation  over  a  Monte  Carlo  sample.  A  similar  zero-variance  principle  holds  for  the 
method  of  simultaneous  optimization  of  several  trial  states  to  be  discussed  next.  This  is 
in  sharp  contrast  with  the  more  traditional  Rayleigh-Ritz  extremization  of  the  Rayleigh 
quotient,  which  frequently  can  produce  arbitrarily  poor  results  if  minimized  over  a  small 
sample  of  configurations. 

For  conceptual  simplicity,  we  first  generalize  the  preceding  method  to  the  more  general 
ideal  case  that  reproduces  the  exact  values  of  the  desired  n  eigenstates  of  the  evolution 
operator  G.  As  a  byproduct,  our  discussion  will  produce  an  alternative  to  the  derivation  of 
Eq.  (J5DJ).  To  compute  n  eigenvalues,  we  have  to  optimize  the  n  basis  states  |wj),  where  we 
have  dropped  the  index  “T”,  and  again  we  assume  we  have  a  sample  of  M  configurations  SQ, 
a  =  1, . . . ,  M  sampled  from  u2.  The  case  we  consider  is  ideal  in  the  sense  that  we  assume 
that  these  basis  states  | ufi  span  an  n-dimensional  invariant  subspace  of  G.  In  that  case,  by 
definition  there  exists  a  matrix  A  of  order  n  such  that 

n 

Ui{  sa)  =  ^AijUfiSa).  (61) 

3= 1 

Again,  the  prime  on  the  left-hand  side  of  this  equation  indicates  multiplication  by  G  or 
by  7 i.  =  —  T-1lnG.  If  the  number  of  configurations  is  large  enough,  A  is  for  all  practical 
purposes  determined  uniquely  by  the  set  of  equations  (|6l|)  and  one  finds 
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A  =  N-'H, 


(62) 


where 


^  =  ±T%=1ui(Sa)uj(Sa)/us(Sa)\ 

Hij  =  ±  Ea=l  u^u'^/u^f.  (63) 

In  the  limit  M  — >  oo  this  indeed  reproduces  the  matrices  N  and  H  in  Eq.  (|50|).  In  the 
nonideal  case,  the  space  spanned  by  the  n  basis  states  | uf)  is  not  an  invariant  subspace  of 
the  matrix  G.  In  that  case,  even  though  Eq.  (^TJ)  generically  has  no  true  solution,  Eqs.  (|62|) 
and  (|63|)  still  constitute  a  solution  in  the  least-squares  sense,  as  the  reader  is  invited  to  show 
for  himself  by  solving  the  normal  equations. 

Next,  let  us  consider  the  construction  of  a  generalized  optimization  criterion.  As  men¬ 
tioned  before,  if  a  set  of  basis  states  span  an  invariant  subspace,  so  does  any  nonsingular 
linear  combination.  In  principle,  the  optimization  criterion  should  have  the  same  invariance. 
The  matrix  A  lacks  this  property,  but  its  spectrum  is  invariant.  Another  consideration  is 
that,  while  the  local  eigenvalue  is  defined  by  a  single  configuration  S,  it  takes  at  least  n 
configurations  to  determine  the  “local”  matrix  A.  This  suggests  that  one  subdivide  the 
sample  into  subsamples  of  at  least  n  configurations  each  and  minimize  the  variance  of  the 
local  spectrum  over  these  subsamples.  Again  in  principle,  this  has  the  advantage  that  the 
optimization  can  exploit  the  fact  that  linear  combinations  of  the  basis  states  have  more 
variational  freedom  to  represent  the  eigenstates  than  does  each  variational  basis  function 
separately.  In  practice,  however,  this  advantage  seems  to  be  negated  by  the  difficulty  of  find¬ 
ing  good  optimal  parameters.  This  is  a  consequence  of  the  fact  that  invariance  under  linear 
transformation  usually  can  be  mimicked  by  variation  of  the  parameters  of  the  basis  states. 
In  other  words,  a  linear  combination  of  basis  states  can  be  represented  accurately,  at  least 
relative  to  the  noise  in  the  local  spectrum,  by  a  single  basis  state  with  appropriately  chosen 
parameters.  Consequently,  intrinsic  flaws  of  the  trial  states  exceed  what  can  be  gained  in 
reducing  the  variance  of  the  local  spectrum  by  exploiting  the  linear  variational  freedom, 
most  of  which  is  already  used  anyway  in  the  linear  variational  problem  that  was  discussed 
at  the  beginning  of  this  section.  This  means  that  one  has  to  contend  with  a  near-singular 
nonlinear  optimization  problem.  In  practice,  to  avoid  the  concomitant  slow  convergence, 
it  seems  to  be  more  efficient  to  break  the  “gauge  symmetry”  and  select  a  preferred  basis, 
which  most  naturally  is  done  by  requiring  that  each  basis  state  itself  is  a  good  approximate 
eigenstate. 

The  preceding  considerations,  of  course,  leave  us  with  two  criteria,  viz.  minimization 
of  the  variance  of  the  local  spectrum  as  a  whole,  and  minimization  of  the  variance  of  the 
configurational  eigenvalue  separately.  To  be  of  practical  use,  both  criteria  have  to  be  com¬ 
bined,  since  if  one  were  to  proceed  just  by  minimization  of  the  variance  of  the  configurational 
eigenvalues  separately,  one  would  simply  keep  reproducing  the  same  eigenstate.  In  a  non- 
Monte  Carlo  context  this  can  be  solved  simply  by  some  orthogonalization  scheme,  but  as 
far  as  Monte  Carlo  is  concerned,  that  is  undesirable  since  it  fails  to  yield  a  zero-variance 
optimization  principle. 


21 


V.  BRANCHING  MONTE  CARLO 


In  Section  0  we  discussed  a  method  to  compute  Monte  Carlo  averages  by  exploiting  the 
power  method  to  reduce  the  spectral  weight  of  undesirable,  subdominant  eigenstates.  We 
saw  that  this  leads  to  products  of  weights  of  subsequent  configurations  sampled  by  a  Monte 
Carlo  time  series.  To  suppress  completely  the  systematic  errors  due  to  finite  projection 
times,  i.e.,  the  variational  bias,  one  has  to  take  averages  of  infinite  products  of  weights. 
This  limit  would  produce  an  “exact”  method  with  infinite  variance,  which  is  of  no  practical 
use. 

We  have  also  discussed  how  optimized  trial  states  can  be  used  to  reduce  the  variance  of 
this  method.  The  variance  reduction  may  come  about  in  two  ways.  In  the  first  place,  by 
starting  with  optimized  trial  states  of  higher  quality,  the  variational  bias  is  smaller  to  begin 
with  so  that  fewer  power  method  projections  are  required.  In  practical  terms,  this  leads  to  a 
reduction  of  the  number  of  factors  in  the  fluctuating  products.  Secondly,  a  good  estimate  of 
the  dominant  eigenstate,  can  be  used  to  reduce  the  amount  by  which  the  evolution  operator, 
divided  by  an  appropriate  constant,  violates  conservation  of  probability,  which  reduces  the 
variance  of  the  individual  fluctuating  weight  factors.  All  these  considerations  also  apply 
to  the  branching  Monte  Carlo  algorithm  discussed  in  this  section,  which  can  be  modified 
accordingly  and  in  complete  analogy  with  our  previous  discussion. 

Before  discussing  the  details  of  the  branching  algorithm,  we  mention  that  the  algorithm 
presented  lierecl  contains  the  mathematical  essence  of  both  the  diffusion  and  transfer  matrix 
Monte  Carlo  algorithms.  A  related  algorithm,  viz.,  Green  function  Monte  Carlo,  adds  yet 
another  level  of  complexity  due  to  the  fact  that  the  evolution  operator  is  known  only  as  an 
infinite  series.  This  series  is  stochastically  summed  at  each  step  of  the  power  method  itera¬ 
tions.  In  practice  this  implies  that  even  the  time  step  becomes  stochastic  and  intermediate 
Monte  Carlo  configurations  are  generated  that  do  not  contribute  to  expectation  values.  Nei¬ 
ther  Green  function  Monte  Carlo,  nor  its  generalization  designed  to  compute  quantities  at 
non-zero  temperature^,  will  be  discussed  in  this  chapter  and  we  refer  the  interested  reader 
to  the  literature  for  further  details.i!~§l 

Let  us  consider  in  detail  the  mechanism  that  produces  large  variance.  This  will  allow 
us  to  explain  what  branching  accomplishes  if  one  has  to  compute  products  of  many  (ideally 
infinitely  many)  fluctuating  weights.  The  time  average  over  these  products  will  typically  be 
dominated  by  only  very  few  large  terms;  the  small  terms  are  equally  expensive  to  compute, 
but  play  no  significant  role  in  the  average.  This  problem  can  be  solved  by  performing  many 
simultaneous  Monte  Carlo  walks.  One  evolves  a  collection  of  walkers  from  one  generation  to 
the  next  and  the  key  idea  is  to  eliminate  the  light-weight  walkers  which  produce  relatively 
small  contributions  to  the  time  average.  To  keep  the  number  of  walkers  reasonably  con¬ 
stant,  heavy-weight  walkers  are  duplicated  and  the  clones  are  subsequently  evolved  (almost) 
independently. 

An  algorithm  designed  according  to  this  concept  does  not  cut  off  the  products  over 
weights  and  therefore  seems  to  correspond  to  infinite  projection  time.  It  would  therefore 
seem  that  the  time  average  over  a  stationary  branching  process  corresponds  to  an  average 
over  the  exact  dominant  eigenstate  of  the  Monte  Carlo  evolution  operator,  but,  as  we  shall 
see,  this  is  rigorously  the  case  only  in  the  limit  of  an  infinite  number  of  walkerscuhu;  for  any 
finite  number  of  walkers,  the  stationary  distribution  has  a  bias  inversely  proportional  to  the 
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number  of  walkers,  the  so-called  population  control  bias.  If  the  fluctuations  in  the  weights 
are  small  and  correlations  (discussed  later)  decay  rapidly,  this  bias  tends  to  be  small.  In 
many  applications  this  appears  to  be  the  case  and  the  corresponding  bias  is  statistically 
insignificant.  However,  if  these  methods  are  applied  to  statistical  mechanical  systems  at  the 
critical  point,  significant  bias  can  be  introduced.  We  shall  discuss  a  simple  method  of  nearly 
vanishing  computational  cost  to  detect  this  bias  and  correct  for  it  in  all  but  the  worst-cases 
scenarios. 

To  discuss  the  branching  Monte  Carlo  version  of  the  power  method,  we  continue  to 
use  the  notation  introduced  above  and  again  consider  the  Monte  Carlo  evolution  operator 
G(S'|S).  As  above,  the  states  S  and  S'  will  be  treated  here  as  discrete,  but  in  practice  the 
distinction  between  continuous  and  discrete  states  is  a  minor  technicality,  and  generalization 
to  the  continuous  case  follows  immediately  by  replacing  sums  by  integrals  and  by  replacing 
Kronecker  <5’s  by  Dirac  5  functions. 

To  implement  the  power  method  iterations  in  Eq.  ([L4|)  by  a  branching  Monte  Carlo 
process,  | u®)  is  represented  by  a  collection  of  Nt  walkers,  where  a  walker  by  definition 
is  a  state-weight  pair  (SQ,  wa),  a  =  1, . . . ,  Nt.  As  usual,  the  state  variable  Sa  represents  a 
possible  configuration  of  the  system  evolving  according  to  G,  and  wa  represents  the  statistical 
weight  of  walker  a.  These  weights  appear  in  averages  and  the  efficiency  of  the  branching 
Monte  Carlo  algorithm  is  realized  by  maintaining  the  weights  in  some  range  W\  <  wa  <  wu, 
where  w\  and  wu  are  bounds  introduced  so  as  to  keep  all  weights  wa  of  the  same  order  of 
magnitude. 

The  first  idea  is  to  interpret  a  collection  of  walkers  that  make  up  generation  t  as  a 
representation  of  the  (sparse)  vector  \yM')  with  components 


(S|w^)  =  «(t)(S)  =  ^2  wa8s,s0 


Nt 


(64) 


a= 1 


where  S  is  the  usual  Kronecker  5-function.  The  underbar  is  used  to  indicate  that  the  «^( S) 
represent  a  stochastic  vector  \u^).  Of  course,  the  same  is  true  formally  for  the  single 
thread  Monte  Carlo.  The  new  feature  is  that  one  can  think  of  the  collective  of  walkers  as  a 
reasonably  accurate  representation  of  the  stationary  state  at  each  time  step,  rather  than  in 
the  long  run. 

The  second  idea  is  to  define  a  stochastic  process  in  which  the  walkers  evolve  with  tran¬ 
sition  probabilities  such  that  the  expectation  value  of  ct+\  \u('t+1's)1  as  represented  by  the 
walkers  of  generation  t  +  1,  equals  for  any  given  collection  of  walkers  representing 

|uW).  It  is  tempting  to  conclude  that,  owing  to  this  construction,  the  basic  recursion  re¬ 
lation  of  the  power  method,  Eq.  (0).  is  satisfied  in  an  average  sense,  but  this  conclusion 
is  not  quite  correct.  The  reason  is  that  in  practice,  the  constants  ct  are  defined  on  the  fly. 
Consequently,  q+ i  and  | vM+1^)  are  correlated  random  variables  and  therefore  there  is  no 
guarantee  that  the  stationary  state  expectation  value  of  | u®)  is  exactly  an  eigenstate  of  G, 
except  in  the  limit  of  nonfluctuating  normalization  constants  ct,  which,  as  we  shall  see,  is 
tantamount  to  an  infinite  number  of  walkers.  More  explicitly,  the  problem  is  that  if  one 


takes  the  time  average  of  Eq.  (|l4l)  and  if  the  fluctuations  of  the  ct+i  are  correlated  with 
| u®)  or|u)m))  one  does  not  produce  the  same  state  on  the  left-  and  right-hand  sides  of  the 
time-averaged  equation  and  therefore  the  time-averaged  state  need  not  satisfy  the  eigenvalue 
equation.  The  resulting  bias  has  been  discussed  in  the  various  contextsJhahffill 
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One  way  to  define  a  stochastic  process  is  to  rewrite  the  power  method  iteration  Eq.  fll4|) 


as 


where 


M(,+1)(S')  =  —  E  P(S'|S)9(S)u<*>(S), 

O+l  s 


(65) 


g( S)  =  £G(S'|S)  C(S'|S)  =  G(S'|S)/S(S).  (66) 

S' 

This  is  in  fact  what  how  transfer  matrix  Monte  Carlo  is  defined.  Referring  the  reader  back 
to  the  discussion  of  Eq.  (^),  we  note  that  in  diffusion  Monte  Carlo  the  weight  D  is  defined 
so  that  it  is  not  just  a  function  of  the  initial  state  S,  but  also  depends  on  the  final  S'. 
The  algorithm  given  below  can  trivially  be  generalized  to  accommodate  this  by  making  the 
substitution  g( S)  — ■>  g(S'|S). 

Equation  (|65|)  describes  a  process  represented  by  a  Monte  Carlo  run  which,  after  a  few 
initial  equilibration  steps,  consists  of  a  time  series  of  Mo  updates  of  all  walkers  at  times 
labeled  by  t  =  . . . ,  0, 1, . . . ,  M0.  The  update  at  time  t  consists  of  two  steps  designed  to 
perform  stochastically  the  matrix  multiplications  in  Eq.  (|65|).  Following  Nightingale  and 
B15te,0  the  process  is  defined  by  the  following  steps.  Let  us  consider  one  of  these  updates, 
the  one  that  transforms  the  generation  of  walkers  at  time  t  into  the  generation  at  time  t+  1. 
We  denote  variables  pertaining  to  times  t  and  t  +  1  respectively  by  unprimed  and  primed 
symbols. 

1.  Update  the  old  walker  (S j,Wj)  to  yield  a  temporary  walker  (S'*,  to')  according  to  the 
transition  probability  P(S',jSj),  where  w[  =  g(Si)wi/c',  for  i  =  1, . . .  ,Nt.  Step  two, 
defined  below,  can  change  the  number  of  walkers.  To  maintain  their  number  close  to 
a  target  number,  say  N0,  choose  c'  =  Ao(W/-^ro)1^s,  where  Ao  is  a  running  estimate  of 
the  eigenvalue  Ao  to  be  calculated,  where  s  >  1  [see  discussion  after  Eq.  (|68|)1 . 

2.  From  the  temporary  walkers  construct  the  new  generation  of  walkers  as  follows 

(a)  Split  each  walker  (S',  w')  for  which  w'  >  bu  into  two  walkers  (S',  \w').  The  choice 
bu  =  2  is  a  reasonable  one. 

(b)  Join  pairs  (S and  (S'j,u>')  with  w[  <  b\  and  w'  <  b\  to  produce  a  single 
walker  (S'fc,u;'  +  in'),  where  S'*,  =  S';  or  S'^  =  S''  with  relative  probabilities  w[ 
and  Wj.  Here  b\  =  1/2  is  reasonable. 

(c)  Walkers  for  which  b\  <  w[  <  bu,  or  left  single  in  step  [2b],  become  members  of  the 
new  generation  of  walkers. 

Note  that,  if  the  weights  g( S)  fluctuate  on  average  more  than  by  a  factor  of  two,  multiple 
split  and  join  operations  may  be  needed. 

ft  may  help  to  explicate  why  wildly  fluctuating  weight  adversely  impact  the  efficiency 
of  the  algorithm.  In  that  case,  some  walkers  will  have  multiple  descendents  in  the  next 
generations,  whereas  others  will  have  none.  This  leads  to  an  inefficient  algorithm  since  any 
generation  will  have  several  walkers  that  are  either  identical  or  are  closely  related,  which 
will  produce  strongly  correlated  contributions  to  the  statistical  time  averages.  In  its  final 
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analysis,  this  is  the  same  old  problem  that  we  encountered  in  a  single-thread  algorithm, 
where  averages  would  be  dominated  by  few  terms  with  relatively  large,  explicitly  given 
statistical  weights.  Branching  mitigates  this  problem  since  walkers  that  are  descendants  of 
a  given  walker  eventually  decorrelate,  but,  as  discussed  in  Sections  |ITI|  and  |V|.  the  best  cure 
is  importance  sampling  and  in  practice  both  strategies  are  used  simultaneously. 

The  algorithm  described  above  was  constructed  so  that  for  any  given  realization  of  | u®), 
the  expectation  value  of  ct+i\u}t+1'1) ,  in  accordance  with  Eq.  (ED,  satisfies 


E 


ci+i|M(‘+1>) 


G|m(,)>, 


(67) 


where  E(-)  denotes  the  conditional  average  over  the  transitions  defined  by  the  preceding 
stochastic  process.  More  generally,  by  p-fold  iteration  one  hndsEa 

=  Gp\u^).  (68) 


E 


IWd  b<l+!>)) 


.  \h= 1 


The  stationary  state  average  of  | w^)  is  close  to  the  dominant  eigenvector  of  G ,  but, 
as  mentioned  above,  it  has  a  systematic  bias,  proportional  to  1  /Nt,  when  the  number  Nt 
of  walkers  is  finite.  If,  as  is  the  case  in  some  applications,  this  bias  exceeds  the  statistical 
errors,  one  can  again  rely  on  the  power  method  to  reduce  this  bias  by  increasing  p.  If  that 
is  done,  one  is  back  to  the  old  problem  of  having  to  average  products  of  fluctuating  weights, 
and,  as  usual,  the  variance  of  the  corresponding  estimators  increases  as  their  bias  decreases. 
Fortunately,  in  practice  the  population  control  bias  of  the  stationary  state  is  quite  small,  if 
at  all  detectable,  but  even  in  those  cases,  expectation  values  involving  several  values  of  p 
should  be  computed  to  verify  the  absence  of  population  control  bias.  The  reader  is  referred 
to  Refs.  iMfor  a  more  detailed  discussion  of  this  problem.  Suffice  it  to  mention  here, 
first  of  all,  that  s,  as  defined  in  the  first  step  of  the  branching  algorithm  given  above  is  the 
expected  number  of  time  steps  it  takes  to  restore  the  number  of  walkers  to  its  target  value 
N0  and,  secondly,  that  strong  population  control  (s  =  1)  tends  to  introduce  a  stronger  bias 
than  weaker  control  (s  >  1). 

With  Eq.  (p8p  one  constructs  an  estimator^  of  the  dominant  eigenvector  IrA00))  of  the 
evolution  operator  G: 


i  M0  (p-1 

\^)  =  irY.  IR-dk®)- 


Mi 


o  t=  1  \  6=0 


(69) 


For  p  —  0,  in  which  case  the  product  over  b  reduces  to  unity,  this  yields  the  stationary 
state  of  the  branching  Monte  Carlo  which  frequently  is  treated  as  the  dominant  eigenstate 
of  G. 

Clearly,  this  branching  Monte  Carlo  algorithm  can  be  used  to  compute  the  right-projected 
mixed  estimators  that  were  denoted  by  in  Section  0  For  this  purpose  one  sets  p'  —  0 

in  Eq.  (|T5|)  and  makes  the  substitution  (ua\  =  («t|  and  Gp\up)\  =  | u^).  Expressions  for 
several  values  of  p  can  be  computed  simultaneously  and  virtually  for  the  price  of  one.  We 
explicitly  mention  the  important  special  case  obtained  by  choosing  for  the  general  operator 
A"  the  evolution  operator  G  itself.  This  yields  the  following  estimator  for  the  dominant 
eigenvalue  Aq  of  G: 
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gS  (ng.pc^i.)  n-,|,) 

E&dnCA-s)^1'-11’ 


where 


Nt 


=  £><V(S,). 


(70) 


(71) 


2=1 


In  diffusion  Monte  Carlo  this  estimator  can  be  used  to  construct  the  growth  estimate  of  the 
ground  state  energy.  That  is,  since  in  that  special  case  G  ~  exp(— tH),  eigenvalues  of  the 
evolution  operator  and  the  Hamiltonian  are  related  by 


E0  =  --  In  A0. 
r 


(72) 


Besides  expressions  such  as  Eq.  m  one  can  construct  expresssions  with  reduced  vari¬ 
ance.  These  involve  the  configurational  eigenvalue  of  G  or  7i  in  the  same  way  this  was  done 
in  our  discussion  of  the  single-thread  algorithm. 

Again,  in  practical  applications  it  is  important  to  combine  the  raw  branching  algorithm 
defined  above  with  importance  sampling.  Mathematically,  this  works  in  precisely  the  same 
way  as  in  Section  |TTT]  in  that  one  reformulates  the  same  algorithm  in  terms  of  the  similarity 
transform  G  with  ug  =  ut  chosen  to  be  an  accurate,  approximate  dominant  eigenstate  [see 
Eq.  (|32|)  1 .  In  the  single-thread  algorithm,  the  result  is  that  the  fluctuations  of  the  weights 
g  and  their  products  are  reduced.  In  the  context  of  the  branching  algorithm,  this  yields 
reduced  fluctuations  in  the  weight  of  walkers  individually  and  in  the  size  of  the  walker 
population.  One  result  is  that  the  population  control  bias  is  reduced.  If  we  ignore  this 
bias,  a  more  fundamental  difference  is  that  the  steady  state  of  the  branching  algorithm  is 
modified.  That  is,  in  the  raw  algorithm  the  walkers  sample  the  dominant  eigenstate  of  G, 
i.e.,  Vh(S),  but,  if  the  trial  state  |«t)  is  used  for  importance  sampling,  the  distribution  is 
mt(S)V;o(S),  which,  of  course,  is  simply  the  dominant  eigenstate  of  G. 

So  far,  we  have  only  discussed  how  mixed  expectation  values  can  be  computed  with 
the  branching  Monte  Carlo  algorithm  and,  as  was  mentioned  before,  this  yields  the  desired 
result  only  if  one  deals  with  operators  that  commute  with  the  evolution  operator  G.  This 
algorithm  can,  however,  also  be  used  to  perform  power  method  projections  to  the  left.  In 
fact,  most  of  the  concepts  discussed  in  Sections  |III  C  l|JTl  C  2|,  and  |III  C~3|  can  be  implemented 
straightforwardly.  To  illustrate  this  point  we  shall  show  how  one  can  compute  the  left-  and 
right-projected  expectation  value  of  a  diagonal  operator  X.  Since  the  branching  algorithm 
is  designed  to  explicitly  perform  the  multiplication  by  G  including  all  weights,  all  that  is 
required  is  the  following  generalization!^,  called  forward  or  future  walking. 

Rather  than  defining  a  walker  to  be  the  pair  formed  by  a  state  and  a  weight,  for  for¬ 
ward  walking  we  define  the  walker  to  be  of  the  form  [S,  w,  A(S_i), ...,  A(S_p/)],  where 
S_i,S_2, ...  are  previous  states  of  the  walker.  I11  other  words,  each  walker  is  equipped 
with  a  finite  stack  of  depth  p'  of  previous  values  of  the  diagonal  operator  A".  I11  going  from 
one  generation  of  walkers  to  the  next,  the  state  and  weight  of  a  walker  are  updated  just  as 
before  to  S'  and  w'.  The  only  new  feature  is  that  the  value  A"(S)  is  pushed  on  the  stack: 
[S,  w,  A"(S_i), ...,  A"(S_p/)]  — ■>  [S',  ru',  A(S),  A(S_i), ...,  A(S_p/+i)].  In  this  way,  the  //  times 
left-projected  expectation  value  of  X  is  obtained  simply  by  replacing  A(S)  by  A"(S_p/). 
Note  that  one  saves  the  history  of  A  rather  than  a  history  of  configurations  only  for  the 
purpose  of  efficient  use  of  computer  memory. 
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VI.  DIFFUSION  MONTE  CARLO 


A.  Simple  Diffusion  Monte  Carlo  Algorithm 

The  diffusion  Monte  Carlo  method®  @  discussed  in  this  section  is  an  example  of  the 
general  class  of  algorithms  discussed  in  this  chapter,  all  of  which  rely  on  stochastic  imple¬ 
mentation  of  the  power  method  to  increase  the  relative  spectral  weight  of  the  eigenstates  of 
interest.  In  the  quantum  mechanical  context  of  the  current  section,  the  latter  is  usually  the 
ground  state.  In  this  general  sense,  there  is  nothing  new  in  this  section.  However,  a  number 
of  features  enter  that  lead  to  technical  challenges  that  cannot  be  dealt  with  in  a  general 
setting.  There  is  the  problem  that  the  projector,  the  evolution  operator  G  =  exp(— r7i)} 
is  only  known  in  the  limit  of  small  imaginary  time  r,  and  in  addition,  for  applications  to 
electronic  structure  problems,  there  are  specific  practical  difficulties  associated  with  nodes 
and  cusps  in  the  wave  functions.  Bosonic  ground  states  are  simpler  in  that  they  lack  nodes, 
and  we  shall  deal  with  those  systems  only  in  passing.  As  in  the  rest  of  this  chapter,  we 
deal  only  with  the  basic  concepts.  In  particular,  we  focus  on  considerations  relevant  to  the 
design  of  an  efficient  diffusion  Monte  Carlo  algorithm.  The  reader  is  referred  to  Refs.  mm 
for  applications  and  other  issues  not  covered  here. 

For  quantum  mechanical  problems,  the  power  method  iteration  Eq.  (|14|)  takes  the  form 


\4>(t  +  r))  =  e 


_  -(W-.Et)t 


1 


(73) 


Here  E t  is  a  shift  in  energy  such  that  E0  —  E t  ~  0,  where  Eq  is  the  ground  state  energy. 
In  the  real-space  representation  we  have 

V>(R ',t  +  r)  =  JdR  (R,|e-(w-&r)r|R)  (R|V>(*))  =  Ji dR  G(R',  R, t)  V>(R,t),  (74) 

In  practical  Monte  Carlo  settings,  the  shift  E x  is  computed  on  the  fly  and  consequently 
is  a  slowly  varying,  nearly  constant  function  of  time,  but  for  the  moment  we  take  it  to  be 
constant.  The  wavefunction  "0(R,  t)  is  the  solution  to  the  Schrodinger  equation  in  imaginary 
time 


2/i 


V2^(R,  t)  +  [V(R)  -  Et]^(R,  t)  = 


dip(  R,  t) 
dt 


(75) 


To  make  contact  with  Eq.  fl29|)  we  should  factor  the  Green  function  into  a  probability 
conserving  part  and  a  weight.  For  small  times,  this  can  be  accomplished  by  the  following 


approximation.  If,  on  the  left-hand  side,  we  just  had  the  first  term,  Eq.  (75)  would  reduce 
to  a  diffusion  equation,  whence  the  method  gets  its  name.  For  diffusion,  the  Green  function 
is  probability  conserving  and  is  given  by 


P(R',R,r) 


g-MR'-R  )2/2t 
(27rr//x)3n/2 


(76) 


If,  on  the  other  hand,  we  just  had  the  second  term  then  Eq.  ([T5|)  would  reduce  to  a  rate 
equation  for  which  the  Green  function  is  h(R'  —  R)  with  prefactor 


p(R,,R,r)  =  eT^-[v(R)+v(R/)]/2>. 


(77) 
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In  combining  these  ingredients,  we  have  to  contend  with  the  following  general  relation 
for  noncommuting  operators  7 d* 

e(Wi+H2)r  =  e\UxTji*re\‘HxT  +  3)  =  e«i  TgWar  +  £>^2).  (78) 


As  long  as  one  is  dealing  with  a  5-function,  the  weight  in  Eq.  ([77])  is  evaluated  always  at 
R '  =  R,  and  therefore  the  expression  on  the  right  can  be  written  also  in  non-symmetric  form. 
However,  Eq.  (|78|)  suggests  that  the  exponent  in  Eq.  ([77])  should  be  used  in  symmetric  fashion 
as  written.  This  is  indeed  the  form  we  shall  employ  for  the  time  being,  but  we  note  that 
the  final  version  of  the  diffusion  algorithm  employs  a  nonsymmetric  split  of  the  exponent, 
since  proposed  moves  are  not  always  accepted.  Since  there  are  other  sources  of  0{r2) 
contributions  anyway  this  does  not  degrade  the  performance  of  the  algorithm.  Combination 
of  the  preceding  ingredients  yields  the  following  approximate,  short-time  Green  function  for 
Eq.  m 


G(R',R,t) 


e-MR'-R)2/2r 

(27rr//i)3n/2 

e-M(R'-R)2/2r 

(27rr//i)3n/2 


er{Sr-[V(R')+V(R)]/2}  +  ^,/JU 
er[ET- V(R)]  +  2) 


(79) 


For  this  problem,  the  power  method  can  be  implemented  by  Monte  Carlo  by  means 
of  both  the  single-thread  scheme  discussed  in  Section  JT|  and  the  branching  algorithm  of 
Section  |V|.  We  shall  use  the  latter  option  in  our  discussion.  In  this  case,  one  performs  the 
following  steps.  Walkers  of  the  zeroth  generation  are  sampled  from  ^(R,  0)  =  -0t(R)  using 
the  Metropolis  algorithm.  The  walkers  are  propagated  forward  an  imaginary  time  r  by 
sampling  a  new  position  R'  from  a  multivariate  Gaussian  centered  at  the  old  position  R  and 
multiplying  the  weights  of  the  walkers  by  the  second  factor  in  Eq.  (|79|).  Then  the  split/join 
step  of  the  branching  algorithm  is  performed  to  obtain  the  next  generation  of  walkers,  with 
weights  in  the  targeted  range. 


1.  Diffusion  Monte  Carlo  with  Importance  Sampling 


For  many  problems  of  interest,  the  potential  energy  V(R)  exhibits  large  variations  over 
coordinate  space  and  in  fact  may  diverge  at  particle  coincidence  points.  As  we  have  discussed 
in  the  general  case,  the  fluctuations  of  weights  g,  produce  noisy  statistical  estimates.  As 
described  in  Sections  m  and  this  problem  can  be  greatly  mitigated  by  applying  the  sim¬ 
ilarity  (or  importance  sampling)  transfornrationil’i^  to  the  evolution  operator.  Employing 
the  general  mathematical  identity  S'exp(— t7I)S~1  =  exp(— tSHS^1),  this  transformation 
can  be  applied  conveniently  to  the  Hamiltonian.  That  is,  given  a  trial  function  ^>t(R)  one 
can  introduce  a  distribution  /(R,  t)  =  ^t(R)^(R,  t).  If  ^(R)^)  satisfies  the  Schrodinger 
equation  fEq.  (|75[)1,  it  is  a  simple  calculus  exercise  to  show  that  /  is  a  solution  of  the 
equatioiiLjEJ 

-  et)^t(R)-1/(Ra)  = 

-^V2/(R,*)  +  V  ■  [V(R)/(R, t)]  -  S(R)/(R,t)  =  (80) 
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Here  the  velocity  V  is  a  function  (not  an  operator),  often  referred  to  in  the  literature  as  the 
quantum  force,  and  is  given  by 


V(R)  =  (Vl,...,vn) 


1  V^t(R) 
A4  ^t(R) 


(81) 


The  coefficient  of  the  source  term,  which  is  responsible  for  branching  in  the  diffusion  Monte 
Carlo  context,  is 


S(R)  =  3r-Eh(R), 


(82) 


which  is  defined  in  terms  of  the  local  energy 


£l(R) 


Wt(R) 

Vt(R) 


1  V2^t(R) 

2 fi  V;t(R) 


+  V(R), 


(83) 


the  equivalent  of  the  configurational  eigenvalue  introduced  in  Eq.  (|T8|) . 

Compared  to  the  original  Schrodinger  equation,  to  which  of  course  it  reduces  for  the 
case  if) T  =  1,  the  second  term  in  Eq.  ([SO])  is  new,  and  corresponds  to  drift.  Again,  one 
can  explicitly  write  down  the  Green  function  of  the  equation  with  just  a  single  term  on  the 
left-hand  side.  The  drift  Green  function  G*d  of  the  equation  obtained  by  suppressing  all  but 
this  term  is 


GD(R',R,  r)  =  5[R'-R(r)] 
where  R(r)  satisfies  the  differential  equation 


(84) 


dR 

dt 


V(R) 


(85) 


subject  to  the  boundary  condition  R(0)  =  R.  Again,  at  short  times  the  noncommutativity 
of  the  various  operators  on  the  left-hand  side  of  the  equation  can  be  ignored  and  thus  one 
obtains  the  following  short-time  Green  function. 


G(R',R,r)  =  dR"  5 


R"  —  R(r) 


3-At(R'-R")2/2r 

(27t  r//x)3n/2 


eT{£T-[£L(R')+£L(R)]/2}  +  q(t  2\ 


;-M[R'-R(t)]2/2t 

(27 rr/ /i)3n/2 


er{i?T-[EL(R')+SL(R)]/2}  +  q(t 


(86) 


Eq.  (^)  again  can  be  viewed  in  our  general  framework  by  defining  the  probability  con¬ 
serving  generalization  of  Eq.  © 


P(R,,R,r)  = 


;-/4R'-R(t)]2/2t 

(27 rr/ /r)3n/2 


(87) 


and  the  remainder  of  the  Green  function  is  5[R/  —  R(r)]  with  prefactor 

g( R',  R,  r)  =  e^ET-[^(R')+^(R)]/2}; 


(88) 


29 


which  is  the  analog  of  Eq.  (□)■ 

When  employed  for  the  branching  Monte  Carlo  algorithm,  the  factorization  given  in 
differs  from  the  original  factorization  Eqs.  ([ 


Eqs.  (|87|)  and  (|88|)  differs  from  the  original  factorization  Eqs.  ([H])  and  (|77|)  in  two  respects: 
(a)  the  walkers  do  not  only  diffuse  but  also  drift  towards  the  important  regions,  i.e.,  in  the 
direction  in  which  |'0x(R)|  is  increasing;  and  (b)  the  branching  term  is  better  behaved  since 
it  depends  on  the  local  energy  rather  than  the  potential.  In  particular,  if  the  trial  function 
V>t(R)  obeys  cusp  conditions^  then  the  local  energy  at  particle  coincidences  is  finite  even 
though  the  potential  may  be  infinite.  If  one  were  so  fortunate  that  ^x(R)  is  the  exact 
ground  state,  the  branching  factor  would  reduce  to  a  constant,  which  can  be  chosen  to  be 
unity  by  choosing  E?  to  be  the  exact  energy  of  that  state. 

The  expressions,  as  written,  explicitly  contain  the  expression  R(r),  which  has  to  be 
obtained  by  integration  of  the  velocity  V(R).  Since  we  are  using  a  short-time  expansion 
anyway,  this  exact  expression  may  be  replaced  by  the  approximation 


R(r)  =  R  +  V(R)t  +  0(t2 


(89) 


In  the  improvements  discussed  below,  designed  to  reduce  time-step  error,  this  expression 
is  improved  upon  so  that  regions  where  V  diverges  do  not  make  large  contributions  to  the 
overall  time-step  error. 


2.  Fixed-Node  Approximation 


The  absolute  ground  state  of  a  Hamiltonian  with  particle  exchange  symmetry  is  bosonic. 
Consequently,  unless  one  starts  with  a  fermionic,  i.e.,  antisymmetric  wavefunction  and  im¬ 
plements  the  power  method  in  a  way  that  respects  this  antisymmetry,  the  power  method 
will  reduce  the  spectral  weight  of  the  fermionic  eigenstate  to  zero  relative  to  the  weight  of 
the  bosonic  state.  The  branching  algorithm  described  above  assumes  all  weights  are  pos¬ 
itive  and  therefore  is  incompatible  with  the  requirement  of  preserving  antisymmetry.  The 
algorithm  needs  modification,  if  we  are  interested  in  the  fermionic  ground  state. 

If  the  nodes  of  the  fermionic  ground  state  were  known,  they  could  be  imposed  as  boundary 
conditions!^  and  the  problem  could  be  dealt  with  by  solving  the  Schrodinger  equation  within 
a  single,  connected  region  bounded  by  the  nodal  surface,  a  region  we  shall  refer  to  as  a  nodal 
pocket.  Since  all  the  nodal  pockets  of  the  ground  state  of  a  fermionic  system  are  equivalent, 
this  would  yield  the  exact  solution  of  the  problem  everywhere.  Unfortunately,  the  nodes 
of  the  wavefunction  of  an  n-particle  system  form  a  (3 n  —  l)-dimensional  surface,  which 
should  not  be  confused  with  the  nodes  of  single-particle  orbitals.  Of  this  full  surface,  in 
general,  only  the  (3n  —  3)-dimensional  subset,  corresponding  to  the  coincidence  of  two  like- 
spin  particles,  is  known.  Hence,  we  are  forced  to  employ  an  approximate  nodal  surface  as 
a  boundary  condition  to  be  satisfied  by  the  solution  of  the  Schrodinger  equation.  This  is 
called  the  fixed-node  approximation.  Usually,  one  chooses  for  this  purpose  the  nodal  surface 
of  an  optimized  trial  wave  functional,  and  such  nodes  can  at  times  yield  a  very  accurate 
results  if  sufficient  effort  is  invested  in  optimizing  the  trial  wavefunction. 

Since  the  imposition  of  the  boundary  condition  constrains  the  solution,  it  is  clear  that 
the  fixed-node  energy  is  an  upper  bound  on  the  true  fermionic  ground  state  energy.  In 
diffusion  Monte  Carlo  applications,  the  fixed-node  energy  typically  has  an  error  which  is 
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five  to  ten  times  smaller  than  the  error  of  the  variational  energy  corresponding  to  the  same 
trial  wavefunction,  though  this  can  vary  greatly  depending  on  the  system  and  the  trial 
wavefunction  employed. 

For  the  Monte  Carlo  implementation  of  this  approach  one  has  to  use  an  approximate 
Green  function,  which,  as  we  have  seen,  may  be  obtained  by  iteration  of  a  short  time 
approximation.  To  guide  the  choice  of  an  approximant  accurate  over  a  wide  time  range,  it  is 
useful  to  consider  some  general  properties  of  the  fixed-node  approximation.  Mathematically, 
the  latter  amounts  to  solution  of  the  Schrodinger  equation  in  a  potential  that  equals  the 
original  physical  potential  inside  the  nodal  pocket  of  choice,  and  is  infinite  outside.  The 
corresponding  eigenstates  of  the  Hamiltonian  are  continuous  and  vanish  outside  the  nodal 
pocket.  Note  that  the  solution  can  be  analytically  continued  outside  the  initial  nodal  pocket 
only  if  the  nodal  surface  is  exact,  otherwise  there  is  a  derivative  discontinuity  at  the  nodal 
surface.  These  properties  are  shared  by  the  Green  function  consistent  with  the  boundary 
conditions.  This  can  be  seen  by  writing  down  the  spectral  decomposition  for  the  evolution 
operator  in  the  position  representation 

G(R',R,r)  =  (R'|e-TW|R)  =^^(R')e-TEi^(R)  (90) 


where  the  are  the  eigenstates  satisfying  the  required  boundary  conditions.  For  notational 
convenience  only,  we  have  assumed  that  the  spectrum  has  a  discrete  part  only  and  that 
the  wavefunctions  can  be  chosen  to  be  real.  The  Green  function  vanishes  outside  the  nodal 
pocket  and  generically  vanishes  linearly  at  the  nodal  surface,  as  do  the  wavefunctions. 

The  Green  function  of  interest  in  practical  applications  is  the  one  corresponding  to 
importance  sampling,  the  similarity  transform  of  Eq.  (|90|) 

G(R',R,t)=V-t(R')  (R'|e-r«|R)— t—  (91) 

0t(R) 


This  Green  function  vanishes  at  the  nodes  quadratically  in  its  first  index,  which,  in  the 
Monte  Carlo  context  is  the  one  that  determines  to  which  state  a  transition  is  made  from  a 
given  initial  state. 

The  approximate  Green  functions  of  Eqs.  (j79|)  and  (^)  have  tails  that  extend  beyond 
the  nodal  surface  and,  consequently,  walkers  sampled  from  these  Green  functions  have  a 
finite  probability  of  attempting  to  cross  the  node.  Since  expectation  values  ought  to  be 
calculated  in  the  r  — >  0  limit  the  relevant  quantity  to  consider  is  what  fraction  of  walkers 
attempt  to  cross  the  node  per  unit  time  in  the  r  — >  0  limit.  If  the  Green  function  of  Eq.  (79) 
is  employed,  this  is  a  finite  number,  whereas,  if  the  importance-sampled  Green  function  of 
Eq.  (|86j)  is  employed,  no  walkers  cross  the  surface  since  the  velocity  V  is  directed  away  from 
the  nodes  and  diverges  at  the  nodes.  In  practice,  of  course,  the  calculations  are  performed 
for  finite  r,  but  the  preceding  observation  leads  to  the  conclusion  that  in  the  former  case  it 
is  necessary  to  reduce  to  zero  the  weight  of  a  walker  that  has  crossed  a  node,  i.e.,  to  kill  the 
walker,  while  in  the  latter  case  one  can  either  kill  the  walkers  or  reject  the  proposed  move, 
since  in  the  r  — >  0  limit  they  yield  the  same  result. 

We  now  argue  that,  for  hnite  r,  rejecting  moves  is  the  choice  with  the  smaller  time-step 
error  when  the  importance-sampled  Green  function  is  employed.  Sufficiently  close  to  a  node, 
the  component  of  the  velocity  perpendicular  to  the  node  dominates  all  other  terms  in  the 
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Green  function  and  it  is  illuminating  to  consider  a  free  particle  in  one  dimension  subject  to 
the  boundary  condition  that  0  have  a  node  at  x  =  0.  The  exact  Green  function  for  this 
problem  is 


G(x',X,t)  =  fe-Mx'-*)2/2r  _  e-rtx’+x)*/2n 


while  the  corresponding  importance  sampled  Green  function  is 


x 


G(x' ,x,t)  =  ~  \r~^x'  ~x)2  !2t  —  p~u{x'+x)2/ 2t1 


xJ2ht  /  n 


(92) 


(93) 


We  note  that  the  integral  of  the  former,  over  the  region  x  >  0,  is  less  than  one  and  decreases 
with  time.  In  terms  of  the  usual  language  used  for  diffusion  problems,  this  is  because  of 
absorption  at  the  x  =  0  boundary.  In  our  case,  this  provides  the  mathematical  justification 
for  killing  the  walkers  that  cross.  On  the  other  hand,  the  integral  of  the  Green  function  of 
Eq.  (plf)  equals  one.  Consequently,  for  finite  r  it  seems  likely  that  algorithms  that  reject 
moves  across  the  node,  such  as  the  one  discussed  in  Section  |VI  B|  yield  a  better  approximation 
than  algorithms  that  kill  the  walkers  that  cross  the  node. 

As  mentioned,  it  can  be  shown  that  all  the  nodal  pockets  of  the  ground  state  of  a 
fermionic  system  are  equivalent  and  trial  wavefunctions  are  constructed  to  have  the  same 
property.  Consequently,  the  Monte  Carlo  averages  will  not  depend  on  the  initial  distribution 
of  the  walkers  over  the  nodal  pockets.  The  situation  is  more  complicated  for  excited  states, 
since  different  nodal  pockets  of  excited-state  wavefunctions  are  not  necessarily  equivalent, 
neither  for  bosons  nor  for  fermions.  Any  initial  state  with  walkers  distributed  randomly 
over  nodal  pockets  pockets,  will  evolve  to  a  steady  state  distribution  with  walkers  only  in 
the  pocket  with  the  lowest  average  local  energy,  at  least  if  we  ignore  multiple-node  crossings 
and  assume  a  sufficiently  large  number  of  walkers,  so  that  fluctuations  in  the  average  local 
energy  can  be  ignored. 


3.  Problems  with  Simple  Diffusion  Monte  Carlo 

The  diffusion  Monte  Carlo  algorithm  corresponding  to  Eq.  (|86D  is  in  fact  not  viable  for  a 
wavefunction  with  nodes  for  the  following  two  reasons.  Firstly,  in  the  vicinity  of  the  nodes 
the  local  energy  of  the  trial  function  diverges  inversely  proportional  to  the  distance  to 
the  nodal  surface.  For  nonzero  r,  there  is  a  nonzero  density  of  walkers  at  the  nodes.  Since 
the  nodal  surface  for  a  system  with  n  electrons  is  3 n  —  1  dimensional,  the  variance  of  the 
local  energy  diverges  for  any  finite  r.  In  fact,  the  expectation  value  of  the  local  energy 
also  diverges,  but  only  logarithmically.  Secondly,  the  velocity  of  the  electrons  at  the  nodes 
diverges  inversely  as  the  distance  to  the  nodal  surface.  The  walkers  that  are  close  to  a 
node  at  one  time  step,  drift  at  the  next  time  step  to  a  distance  inversely  proportional  to 
the  distance  from  the  node.  This  results  in  a  charge  distribution  with  a  component  that 
falls  off  as  the  inverse  square  of  distance  from  the  atom  or  molecule,  whereas  in  reality  the 
decay  is  exponential.  These  two  problems  are  often  remedied  by  introducing  cut-offs  in  the 
values  of  the  local  energy  and  the  velocityil£l,  chosen  such  that  they  have  no  effect  in  the 
r  — >  0  limit,  so  that  the  results  extrapolated  to  r  =  0  are  correct.  In  the  next  section  better 
remedies  are  presented. 
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B.  Improved  Diffusion  Monte  Carlo  Algorithm 


1.  The  limit  of  Perfect  Importance  Sampling 


In  the  limit  of  perfect  importance  sampling,  that  is  if  0t(R)  =  ^0(R),  the  energy  shift 
i?T  can  be  chosen  such  that  the  branching  term  in  Eq.  (|(J)  vanishes  identically  for  all  R. 
In  this  case,  even  though  the  energy  can  be  obtained  with  zero  variance,  the  steady  state 
distribution  of  the  diffusion  Monte  Carlo  algorithm  discussed  above  is  only  approximately 
the  desired  distribution  iff,  because  of  the  time-step  error  in  the  Green  function.  However, 
since  one  has  an  explicit  expression,  iff,  for  the  distribution  to  be  sampled,  it  is  possible  to 
use  the  Metropolis  algorithm,  described  in  Section  pi  A],  to  sample  the  desired  distribution 
exactly.  Although  the  ideal  0t(R)  =  0o(R)  is  never  achieved  in  practice,  this  observation 
leads  one  to  devise  an  improved  algorithm  that  can  be  used  when  moderately  good  trial 
wavefunctions  are  known. 

If  for  the  moment  we  ignore  the  branching  term  in  Eq.  (BUI),  then  we  have  the  equation 


-_Lvy  +  v.(vf)  =  -f. 


(94) 


This  equation  has  a  known  steady-state  solution  f  =  if  f  for  any  if t,  which  in  the  limit  of 
perfect  importance  sampling  is  the  desired  distribution.  However,  the  approximate  drift- 
diffusion  Green  function  used  in  the  Monte  Carlo  algorithm  defined  by  Eq.  (p6|)  without  the 
branching  factor,  is  not  the  exact  Green  function  of  Eq.  (|94|).  Therefore,  for  any  finite  time 
step  r,  we  do  not  obtain  iff  as  a  steady  state,  even  in  the  ideal  case.  Following  Reynolds  et 
alxB,  we  can  change  the  algorithm  in  such  a  way  that  it  does  sample  iff  in  the  ideal  case, 
which  also  reduces  the  time-step  error  in  nonideahpractical  situations.  This  is  accomplished 
by  using  a  generalized!^  E!  Metropolis  algorithms.  The  approximate  drift-diffusion  Green 
function  is  used  to  propose  moves,  which  are  then  accepted  with  probability 


p  =  min 


( G(R,  R/,r)0T(R/)2  A 

UtR^rhMR)2’  J 


=  1-  9, 


(95) 


in  accordance  with  the  detailed  balance  condition. 

As  was  shown  above,  the  true  fixed-node  Green  function  vanishes  outside  the  nodal 
pocket  of  the  trial  wavefunction.  However,  since  we  are  using  an  approximate  Green  function, 
moves  across  the  nodes  will  be  proposed  for  any  hnite  r.  To  satisfy  the  boundary  conditions 
of  the  fixed-node  approximation  these  proposed  moves  are  always  rejected. 

If  we  stopped  here,  we  would  have  an  exact  and  efficient  variational  Monte  Carlo  algo¬ 
rithm  to  sample  from  iff.  Now,  we  reintroduce  the  branching  term  to  convert  the  steady- 
state  distribution  from  iff  to  ifTifo ■  This  is  accomplished  by  reweighting  the  walkers  with 
the  branching  factor  [see  Eq.  (|86D1 


Aw 


exp{|[S'(R')  +  S^R^Teff}  for  an  accepted  move, 
exp[S'(R)reff]  for  a  rejected  move, 


(96) 


where  S  is  defined  in  Eq.  fl82l).  An  effective  time  step  re g,  which  will  be  defined  presently, 
is  required  because  the  Metropolis  algorithm  introduces  a  hnite  probability  of  not  moving 
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forward  and  rejecting  the  proposed  configuration.  Before  defining  reS,  we  note  that  an 
alternative  to  expression  (^)  is  obtained  by  replacing  the  two  reweighting  factors  by  a 
single  expression, 


Aw  =  exp 


{|(S(R')+5(R))+9S(R)}t,„ 


for  all  moves 


(97) 


This  expression,  written  down  with  Eq.  (|26|)  in  mind,  yields  somewhat  smaller  fluctuations 
and  time-step  error  than  expression  (|96l). 

Following  Reynolds  et  alSB,  an  effective  time  step  reg  is  introduced  in  Eq.  ( |97|)  to  account 
for  the  changed  rate  of  diffusion.  We  set 


(p  A R2) 

TeS  ~  T  (A R2) 

where  the  angular  brackets  denote  the  average  over  all  attempted  moves,  and  A R  are  the 
displacements  resulting  from  diffusion.  This  equals  (Af?2)accepted/ (A R2)  but  again  has  some¬ 
what  smaller  fluctuations. 

An  estimate  of  re g-  is  readily  obtained  iteratively  from  sets  of  equilibration  runs.  During 
the  initial  run,  reg  is  set  equal  to  r.  For  the  next  runs,  the  value  of  reg  is  obtained  from 
the  values  of  reg  computed  with  Eq.  (|38|)  during  the  previous  equilibration  run.  In  practice, 
this  procedure  converges  in  two  iterations,  which  typically  consume  less  than  2%  of  the  total 
computation  time.  Since  the  statistical  errors  in  reg  affect  the  results  obtained,  the  number 
of  Monte  Carlo  steps  performed  during  the  equilibration  phase  needs  to  be  sufficiently  large 
that  this  is  not  a  major  component  of  the  overall  statistical  error. 

The  value  of  reg  is  a  measure  of  the  rate  at  which  the  Monte  Carlo  process  generates 
uncorrelated  configurations,  and  thus  a  measure  of  the  efficiency  of  the  computation.  Since 
the  acceptance  probability  decreases  when  r  increases,  reg  has  a  maximum  as  a  function  of 
r.  However,  since  the  time-step  error  increases  with  r,  it  is  advisable  to  use  values  of  r  that 
are  smaller  than  this  “optimum” . 

Algorithms  that  do  not  exactly  simulate  the  equilibrium  distribution  of  the  drift-diffusion 
equation  if  the  branching  term  is  suppressed,  i.e.,  algorithms  that  do  not  use  the  Metropolis 
accept/reject  mechanism,  can  for  sufficiently  large  r  have  time-step  errors  that  make  the 
energy  estimates  higher  than  the  variational  energy.  On  the  other  hand,  if  the  drift-diffusion 
terms  are  treated  exactly  by  including  an  accept/reject  step,  the  energy,  evaluated  for  any 
r,  must  lie  below  the  variational  energy,  since  the  branching  term  enhances  the  weights  of 
the  low-energy  walkers  relative  to  that  of  the  high-energy  walkers. 


2.  Persistent  Configurations 

As  mentioned  above,  the  accept/reject  step  has  the  desirable  feature  of  yielding  the  exact 
electron  distribution  in  the  limit  that  the  trial  function  is  the  exact  ground  state.  However, 
in  practice  the  trial  function  is  less  than  perfect  and  as  a  consequence  the  accept/reject 
procedure  can  lead  to  the  occurrence  of  persistent  configurations,  as  we  now  discuss. 

For  a  given  configuration  R,  consider  the  quantity  Q  =  (qAw),  where  q  and  Aw  are  the 
rejection  probability  and  the  branching  factor  given  by  Eqs.  (|95|)  and  The  average  in 
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FIG.  2.  Illustration  of  the  persistent  configuration  catastrophe.  The  dotted  horizontal  line  is 
the  true  fixed-node  energy  for  a  simple  Be  wavefunction  extrapolated  to  r  =  0. 


the  definition  of  Q  is  over  all  possible  moves  for  the  configuration  R  under  consideration. 
If  the  local  energy  at  R  is  relatively  low  and  if  reg  is  sufficiently  large,  Q  may  be  in  excess 
of  one.  In  that  case,  the  weight  of  the  walker  at  R,  or  more  precisely,  the  total  weight  of 
all  walkers  in  that  configuration  will  increase  with  time,  except  for  fluctuations,  until  the 
time-dependent  trial  energy  Eg  adjusts  downward  to  stabilize  the  total  population.  This 
population  contains  on  average  a  certain  number  of  copies  of  the  persistent  configuration. 
Since  persistent  configurations  must  necessarily  have  an  energy  that  is  lower  than  the  true 
fixed-node  energy,  this  results  in  a  negatively  biased  energy  estimate.  The  persistent  con¬ 
figuration  may  disappear  because  of  fluctuations,  but  the  more  likely  occurrence  is  that  it 
is  replaced  by  another  configuration  that  is  even  more  strongly  persistent,  i.e.,  one  that  has 
an  even  larger  value  of  Q  —  (qAw).  This  process  produces  a  cascade  of  configurations  of 
ever  decreasing  energies.  Both  sorts  of  occurrences  are  demonstrated  in  Fig.  |2|.  Persistent 
configurations  are  most  likely  to  occur  near  nodes,  or  near  nuclei  if  i[}g  does  not  obey  the 
cusp  conditions.  Improvements  to  the  approximate  Green  function  in  these  regions,  as  dis¬ 
cussed  in  the  next  section,  help  to  reduce  greatly  the  probability  of  encountering  persistent 
configurations  to  the  point  that  they  are  never  encountered  even  in  long  Monte  Carlo  runs. 

Despite  the  fact  that  the  modifications  described  in  the  next  section  eliminate  persistent 
configurations  for  the  systems  we  have  studied,  it  is  clearly  desirable  to  have  an  algorithm 
that  cannot  display  this  pathology  even  in  principle.  One  possible  method  is  to  replace  reg 
in  Eq.  (J96|)  by  r  for  an  accepted  move  and  by  zero  for  a  rejected  move.  This  ensures  that 
Aw  never  exceeds  unity  for  rejected  moves,  hence  eliminating  the  possibility  of  persistent 
configurations.  Further,  this  has  the  advantage  that  it  is  not  necessary  to  determine  reg. 
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Other  possible  ways  to  eliminate  the  possibility  of  encountering  persistent  configurations  are 
discussed  in  Ref.  |50. 


3.  Singularities 


The  number  of  iterations  of  Eq.  (74)  required  for  the  power  method  to  converge  to  the 
ground  state  grows  inversely  with  the  time  step  r.  Thus,  the  statement  made  above,  viz. 
that  the  Green  function  of  Eq.  ([F^)  is  in  error  only  to  0(t2),  would  seem  to  imply  that  the 
errors  in  the  electron  distribution  and  the  averages  calculated  from  the  short-time  Green 
function  are  of  0(t).  However,  the  presence  of  non-analyticities  and  divergences  in  the 
local  energy  and  the  velocity  may  invalidate  this  argument:  the  short-time  Green  function 
may  lack  uniform  convergence  in  r  over  3n- dimensional  configuration  space.  Consequently, 
an  approximation  that  is  designed  to  be  better  behaved  in  the  vicinity  of  singularities  and 
therefore  behaves  more  uniformly  over  space  may  outperform  an  approximation  that  is 
correct  to  a  higher  order  in  r  for  generic  points  in  configuration  space,  but  ignores  these 
singularities. 

Next  we  discuss  some  important  singularities  that  one  may  encounter  and  their  impli¬ 
cations  for  the  diffusion  Monte  Carlo  algorithm.  The  behavior  of  the  local  energy  E l  and 
velocity  V  near  nodes  of  ifa  are  described  in  Table  |.  Although  the  true  wave  function  ^q 
has  a  constant  local  energy  Eq  at  all  points  in  configuration  space,  the  local  energy  of 
diverges  at  most  points  of  the  nodal  surface  of  "0t  for  almost  all  the  V;t  that  are  used  in  prac¬ 
tice,  and  it  diverges  at  particle  coincidences  (either  electron-nucleus  or  electron- electron)  for 
wave  functions  that  fail  to  obey  cusp  conditions^.  The  velocity  V  diverges  at  the  nodes  and 
for  the  Coulomb  potential  has  a  discontinuity  at  particle  coincidences  both  for  approximate 
and  for  the  true  wavefunction.  For  the  nodeless  wavefunction  of  Lennard- Jones  particles  in 
their  ground  state,!  V  diverges  as  r-6  in  the  inter-particle  distance  r.  Since  the  only  other 
problem  for  these  bosonic  systems  is  the  divergence  of  E l  at  particle  coincidences,  we  con¬ 
tinue  our  discussion  for  electronic  systems  and  refer  the  interested  reader  to  the  literature!! 
for  details. 


TABLE  I.  Behavior  of  the  local  energy  E l  and  velocity  v  as  a  function  of  the  distance  R±  of 
an  electron  to  the  nearest  singularity.  The  behavior  of  various  quantities  is  shown  for  an  electron 
approaching  a  node  or  another  particle,  either  a  nucleus  or  an  electron.  The  singularity  in  the  local 
energy  at  particle  overlap  is  only  present  for  a  V’T  that  fails  to  satisfy  the  cusp  conditions. 


Region 

Local  energy 

Velocity 

Nodes 

El  ~  ±-^r-  for 

v~  El 

El  =  Eq  for  i/jq 

Electron- 

El  ~  y  for  some  V’T 

v  has  a  discontinuity 

nucleus  /  electron 

Eh  =  Eq  for  ip0 

for  both  and  tpQ 

The  divergences  in  the  local  energies  cause  large  fluctuations  in  the  population  size: 
negative  divergences  lead  to  large  local  population  densities  and  positive  divergences  lead  to 
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Schematic  comparing  true  and  approximate  Green  functions 


X 


FIG.  3.  Schematic  comparing  the  qualitative  behaviors  of  the  true  G  with  the  approximate  G  of  Eq.  (|8(il) 
of  an  electron  that  is  located  at  x  =  —0.2  at  time  t  =  0  and  evolves  in  the  presence  of  a  nearby  nucleus 
located  at  x  =  0.  The  Green  functions  are  plotted  for  three  times:  t\  <  t%  <  1 3. 


small  ones.  The  divergence  of  V  at  the  nodes  typically  leads  to  a  proposed  next  state  of  a 
walker  in  a  very  unlikely  region  of  configuration  space  and  is  therefore  likely  to  be  rejected. 
The  three-dimensional  velocity  of  an  electron  which  is  close  to  a  nucleus  is  directed  towards 
the  nucleus.  Hence  the  true  Green  function,  for  sufficiently  long  time,  exhibits  a  peak  at 
the  nucleus,  but  the  approximate  Green  function  of  Eq.  (|86D  cause  electrons  to  overshoot 
the  nucleus.  This  is  illustrated  in  Fig.  |3].  where  we  show  the  qualitative  behavior  of  the 
true  Green  function  and  of  the  approximate  Green  function  for  an  electron  which  starts  at 
x  =  —0.2  in  the  presence  of  a  nearby  nucleus  at  x  =  0.  At  short  time  A,  the  approximate 
Green  function  of  Eq.  (^6])  agrees  very  well  with  the  true  Green  function.  At  a  longer  time 
f 2 ,  the  true  Green  function  begins  to  develop  a  peak  at  the  nucleus  which  is  absent  in  the 
approximate  Green  function,  wheras  at  a  yet  longer  time  £3,  the  true  Green  function  is 
peaked  at  the  nucleus  while  the  approximate  Green  function  has  overshot  the  nucleus. 

The  combined  effect  of  these  nonanalyticities  is  a  large  time-step  error,  which  can  be 
of  either  sign  in  the  energy,  and  large  statistical  uncertainty  in  the  computed  expectation 
values.  We  now  give  a  brief  description  of  how  these  nonanalyticities  are  treated.  The  diver¬ 
gence  of  the  local  energy  at  particle  coincidences  is  cured  simply  by  employing  wavefunctions 
that  obey  cusp  conditions^.  The  other  nonanalyticities  are  addressed  by  employing  a  mod¬ 
ification  of  the  Green  function  of  Eq.  (^6|)  such  that  it  incorporates  the  divergence  of  E l 
and  V  at  nodes  and  the  discontinuity  in  V  at  particle  coincidences  but  smoothly  reduces  to 
Eq.  (|86|)  in  the  short-time  limit  or  in  the  limit  that  the  nearest  nonanalyticity  is  far  away. 
The  details  can  be  found  in  Ref.  |50|.  The  modified  algorithm  has  a  time-step  error  which 
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is  two  to  three  orders  of  magnitude  smaller!!  than  the  simple  algorithm  corresponding  to 
Eq.  (|86l)  with  cutoffs  imposed  on  EL  and  V. 

We  have  used  the  application  to  all-electron  electronic  structure  calculations  to  illustrate 
the  sort  of  problems  that  can  lead  to  large  time-step  errors  and  their  solution.  Other 
systems  may  exhibit  only  a  subset  of  these  problems  or  a  modified  version  of  them.  For 
example,  in  calculations  of  bosonic  clusters!  there  are  no  nodes  to  contend  with,  while,  in 
electronic  structure  calculations  employing  pseudopotentia  Ls§  i,  or  pseudo-Hamiltoniansi! 
the  potential  need  not  diverge  at  electron- nucleus  coincidences.  We  note,  in  passing,  that 
the  use  of  the  latter  methods  has  greatly  extended  the  practical  applicability  of  quantum 
Monte  Carlo  methods  to  relatively  large  systems  of  practical  interest!!’!!  but  at  the  price  of 
introducing  an  additional  approximation!^. 


VII.  CLOSING  COMMENTS 


The  material  presented  above  was  selected  to  describe  from  a  unified  point  of  view 
Monte  Carlo  algorithms  as  employed  in  seemingly  unrelated  areas  in  quantum  and  statistical 
mechanics.  Details  of  applications  were  given  only  to  explain  general  ideas  or  important 
technical  problems,  such  as  encountered  in  diffusion  Monte  Carlo.  We  ignored  a  whole 
body  of  literature,  but  we  wish  to  just  mention  a  few  topics.  Domain  Green  function 
Monte  Carlo!!-!!  is  one  that  comes  very  close  to  topics  that  were  covered.  In  this  method 
the  Green  function  is  sampled  exactly  by  iterating  upon  an  approximate  Green  function. 
Infinite  iteration,  which  the  Monte  Carlo  method  does  in  principle,  produces  the  exact 
Green  function.  Consequently,  this  method  lacks  a  time-step  error,  and  in  this  sense  has  the 
advantage  of  being  exact.  In  practice,  there  are  other  reasons,  besides  the  time-step  error 
that  force  the  algorithm  to  move  slowly  through  state  space,  and  currently  the  available 
algorithms  seem  to  be  less  efficient  than  diffusion  Monte  Carlo,  even  when  one  accounts  for 
the  effort  required  to  perform  the  extrapolation  to  vanishing  time  step. 

Another  area  that  we  just  touched  upon  in  passing  is  path  integral  Monte  Carlo.!!  Here 
we  remind  the  reader  that  path  integral  Monte  Carlo  is  a  particularly  appealing  alternative 
for  the  evaluation  of  matrix  elements  such  as  such  as  :P')  in  Eq.  (15).  The  advantage  of 
this  method  is  that  no  products  of  weights  appear,  but  the  disadvantage  is  that  it  seems 
to  be  more  difficult  to  move  rapidly  through  state  space.  This  is  a  consequence  of  the  fact 
that  branching  algorithms  propagate  just  a  single  time  slice  through  state  space,  whereas 
path  integral  methods  deal  with  a  whole  stack  of  slices,  which  for  sampling  purposes  tends 
to  produce  a  more  rigid  object.  Finally,  it  should  be  mentioned  that  we  ignored  the  vast 
literature  on  quantum  lattice  systems^. 

In  splitting  the  evolution  operator  into  weights  and  probabilities  [see  Eq.  (|33|)1  we  as¬ 
sumed  that  the  weights  were  non-negative.  To  satisfy  this  requirement,  the  fixed-node 
approximation  was  employed  for  fermionic  systems.  An  approximation  in  the  same  vein  is 
the  fixed-phase  approximation,!!  which  allows  one  to  deal  with  systems  in  which  the  wave- 
function  is  necessarily  complex  valued.  The  basic  idea  here  is  analogous  to  that  underlying 
the  fixed-node  approximation.  In  the  latter,  a  trial  function  is  used  to  approximate  the  nodes 
while  diffusion  Monte  Carlo  recovers  the  magnitude  of  the  wavefunction.  In  the  fixed  phase 
approximation,  the  trial  function  is  responsible  for  the  phase  and  Monte  Carlo  produces  the 
magnitude  of  the  wavefunction. 
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